Genes of purine biosynthesis from Ashbya gossypii and the use 
thereof in microbial riboflavin synthesis 

5 

The present invention relates to genes of purine biosynthesis 
from Ashbya gossypii and to the use thereof in riboflavin 
synthesis. 

10 Vitamin B2 , also called riboflavin, is essential for humans and 
animals. Vitamin B2 deficiency is associated with inflammations 
of the mucous membranes of the mouth and throat, itching and 
inflammations in the skin folds and similar cutaneous lesions, 
conjunctival inflammations, reduced visual accuracy and clouding 

15 of the cornea. Babies and children may experience cessation of 
growth and loss of weight. Vitamin B2 therefore has economic 
importance, especially as vitamin supplement in cases of vitamin 
deficiency and as supplement to animal feed. It is also employed 
for coloring foodstuffs, for example in mayonnaise, icecream, 

2o blancmange etc. 

Vitamin B2 is prepared either chemically or microbially (see, for 
example, Kurth et al. (1996) riboflavin, in: Ullmann's 
Encyclopedia of industrial chemistry, VCH Weinheim) . In the 

25 chemical preparation process, riboflavin is, as a rule, obtained 
as pure final product in multistage processes, it being necessary 
to employ relatively costly starting materials such as, for 
example, D-ribose. An alternative to the chemical synthesis of 
riboflavin is the preparation of this substance by 

30 microorganisms. The starting materials used in this case are 
renewable raw materials such as sugars or vegetable oils. The 
preparation of riboflavin by fermentation of fungi such as 
Eremothecium ashbyii or Ashbya gossypii is known (The Merck 
Index, Windholz et al., eds. Merck & Co., page 1183, 1983), but 

35 yeasts such as, for example, Candida, Pichia and Saccharomyces, 
or bacteria such as, for example, Bacillus, Clostridia or 
corynebacteria, have also been described as riboflavin producers. 

EP 405370 describes ribof lavin-overproducing bacterial strains 
40 obtained by transformation of the riboflavin biosynthesis genes 
from Bacillus subtilis. These genes described therein, and other 
genes involved in vitamin B2 biosynthesis from prokaryotes are 
unsuitable for a recombinant riboflavin preparation process using 
eukaryotes such as, for example, Saccharomyces cerevisiae or 
45 Ashbya gossypii. 
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DE 44 20 785 describes six riboflavin biosynthesis genes from 
Ashbya gossypii, and microorganisms transformed with these genes, 
and the use of such microorganisms for riboflavin synthesis. 

5 It is possible with these processes to generate producer strains 
for microbial riboflavin synthesis. However, these producer 
strains often have metabolic limitations which cannot be 
eliminated by the inserted biosynthesis genes or are sometimes 
induced thereby. Such producer strains are sometimes unable to 
^ provide sufficient substrate for saturating some steps in the 
biosynthesis, so that the biosynthetic capacity of some segments 
of metabolism cannot be fully exploited. 

It is therefore desirable to enhance further sections of 

15 

metabolic pathways, thereby to eliminate metabolic bottlenecks 
and thus further optimize the microorganism employed for the 
microbial riboflavin synthesis (producer strains) in respect of 
their ability for riboflavin synthesis. It is desirable to 
identify the enhancing sections of the complex metabolism and to 
enhance these in a suitable way. 



20 



The present invention relates to novel proteins of purine 
biosynthesis, the genes therefor and the use thereof for 
25 microbial riboflavin synthesis. 

Purine metabolism (for a review, see, for example, Voet, D. and 
Voet, J.G., 1994, Biochemie, VCH Weinheim, pages 743-771; 
Zalkin, H. and Dixon, J.E., 1992, De novo purine nucleotide 

30 biosynthesis, in: Progress in nucleic acid research and molecular 
biology, Vol. 42, pages 259-287, Academic Press) is a part of the 
metabolism which is essential for all life forms. Faulty purine 
metabolism may in humans lead to serious diseases (e.g. gout). 
Purine metabolism is moreover an important target for treating 

35 oncoses and viral infections. Numerous publications have appeared 
describing substances which intervene in purine metabolism for 
these indications (as review, for example Christopherson, R.I. 
and Lyons, S.D., 1990, Potent inhibitors of de novo pyrimidine 
and purine biosynthesis as chemotherapeutic agents, Med. Res. 

40 Reviews 10, pages 505-548). 

Investigations on the enzymes involved in purine metabolism 
(Smith, J.L., Enzymes in nucleotide synthesis, 1995, Curr. 
Opinion Struct. Biol. 5, 752-757) aim to develop novel 

45 

immunosuppressives, antiparasitic or antiproliferative medicines 
(Biochem. Soc. Transact. 23, pages 877-902, 1995). 
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These medicines are normally not naturally occurring purines, 
pyrimidines or compounds derived therefrom. 

The present invention relates to a protein having the polypeptide 
5 sequence depicted in SEQ ID NO: 2 or a polypeptide sequence 
obtainable from SEQ ID NO: 2 by substitution, insertion or 
deletion of up to 15% of the amino acids, and having the 
enzymatic activity of a phosphoribosyl-pyrophosphate synthetase. 

10 

The sequence depicted in SEQ ID NO: 2 is the gene product of the 
KPR1 gene (SEQ ID N0:1) obtained from Ashbya gossypii. 

The invention further relates to a protein having the polypeptide 
15 sequence depicted in SEQ ID NO: 5 or a polypeptide sequence 
obtainable from SEQ ID NO: 5 by substitution, insertion or 
deletion of up to 10% of the amino acids, and having the 
enzymatic activity of a glutamine-phosphoribosyl-pyrophosphate 
amidotransf erase . 

20 

The sequence depicted in SEQ ID NO: 5 is the gene product of the 
ADE4 gene { SEQ ID NO:3) obtained from Ashbya gossypii. 

The invention further relates to a protein having the polypeptide 
25 sequence depicted in SEQ ID NO: 8 or a polypeptide sequence 
obtainable from SEQ ID NO: 8 by substitution, insertion or 
deletion of up to 20% of the amino acids, and having the 
enzymatic activity of an IMP dehydrogenase. 

30 

The sequence depicted in SEQ ID NO: 8 and 9 is the gene product of 
the GUA1 gene (SEQ ID NO: 7) obtained from Ashbya gossypii. 

The invention further relates to a protein having the polypeptide 
35 sequence depicted in SEQ ID NO: 11 or a polypeptide sequence 
obtainable from SEQ ID NO: 11 by substitution, insertion or 
deletion of up to 10% of the amino acids, and having the 
enzymatic activity of a GMP synthetase. 

40 The sequence depicted in SEQ ID NO: 11 is the gene product of the 
GUA2 gene (SEQ ID NO: 10) obtained from Ashbya gossypii. 

The invention further relates to a protein having the polypeptide 
sequence depicted in SEQ ID NO: 13 or a polypeptide sequence 
45 obtainable from SEQ ID NO: 13 by substitution, insertion or 
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deletion of up to 10% of the amino acids, and having the 
enzymatic activity of a phosphoribosyl-pyrophosphate synthetase. 

The sequence depicted in SEQ ID NO: 13 is the gene product of the 
5 KPR2 gene (SEQ ID NO: 12) obtained from Ashbya gossypii. 



These gene products mentioned can be modified by conventional 
methods of gene technology, such as site-directed mutagenesis, so 
that particular amino acids are replaced, additionally inserted 
or deleted. Amino acid residues are normally (but not 
exclusively) replaced by those of similar volume, charge or 
hydrophilicity/ hydrophobic ity in order not to lose the enzymatic 
properties of the gene products. In particular, modifications of 
the amino acid sequence in the active center frequently results 
in a drastic alteration in the enzymatic activities. However, 
modifications of the amino acid sequence and other, less 
essential sites are often tolerated. 



It is possible with the novel proteins 

20 

1. for up to 15, preferably up to 10 and particularly preferably 
up to 5, % of the amino acids to be modified, by comparison 
with sequences depicted in the sequence listing, in the case 
of the gene product of the AgKPRl gene; 



2. for up to 10 and particularly preferably up to 5% of the 

amino acids to be modified, by comparison with the sequences 
depicted in the sequence listing, in the case of the gene 
30 product of the AgADE 4 gene; 



3. for up to 20, preferably up to 15, particularly preferably up 
to 10 and especially preferably up to 5, % of the amino acids 
to be modified, by comparison with the sequences depicted in 
35 the sequence listing, in the case of the gene product of the 

AgGUAl gene; 



4. for up to 10 and particularly preferably up to 5% of the 

amino acids to be modified, by comparison with the sequences 
depicted in the sequence listing, in the case of the gene 
product of the AgGUA2 gene; 



5. for up to 10%, preferably up to 7% and particularly 

preferably up to 5%, of the amino acids to be modified, by 
comparison with the sequences depicted in the sequence 
listing, in the case of the gene product of the AgKPR2 gene. 
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Preferred proteins are those which, while they still have the 
relevant enzymatic activity, have altered regulation. Many of 
these enzymes are subject to a strong control of the activity by 
intermediates and final products (feedback inhibition). This 
5 leads to the activity of the enzymes being restricted as soon as 
sufficient final product is present. 

However, in the case of producer strains, this economic control 
in the physiological state often results in it being impossible 

10 to increase the productivity beyond a certain limit. Elimination 
of such feedback inhibition results in the enzymes retaining 
their activity, irrespective of the final product concentration, 
and thus metabolic bottlenecks are bypassed. This in the end 

^ leads to a marked increase in riboflavin biosynthesis. 

Preferred novel proteins are those no longer inhibited by 
secondary products of metabolic pathways (derived from products 
of the enzymes). Particularly preferred novel proteins are those 

2Q no longer inhibited by intermediates of purine biosynthesis, in 
particular by purine bases, purine nucleosides, purine nucleotide 
5 '-monophosphates or purine nucleotide 5 ' -diphosphates or purine 
nucleotide 5 ' -triphosphates . Particularly preferred novel 
proteins are those with subsequent modifications of the amino 

25 acid sequence and all combinations of amino acid sequence 
modifications which comprise these subsequent modifications. 

Modifications of the amino acid sequence of the AgKPRl gene 
product : 

30 

Lysine at position 7 replaced by valine 
Aspartate at position 52 replaced by histidine 
Leucine at position l|f replaced by isoleucine 
35 Aspartate at position 186 replaced by histidine 
Alanine at position 193 replaced by valine 
Histidine at position 196 replaced by glutamine 

4Q Modifications of the amino acid sequence of the AgADE4 gene 
product : 



Aspartate at position 310 replaced by valine 
Lysine at position 333 replaced by alanine 
Alanine at position 417 replaced by tryptophan 



7 



91 
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The following Examples describe the preparation of the novel 
proteins and nucleic acids and the use thereof for producing 
microorganisms with increased riboflavin synthesis, 

^ Example 1 : 

Production of a genomic gene bank from Ashbya gossypii ATCC10895 

Genomic DNA from Ashbya gossypii ATCC10895 can be prepared by 
10 conventional methods as described, for example, in WO9703208. The 
genomic gene bank can be constructed starting from this DNA by 
conventional methods (e.g. Sambrook, J. et al. (1989) Molecular 
cloning: a laboratory manual, Cold Spring Harbor Laboratory Press 
or Ausubel, F.M. et al. (1994) Current protocols in molecular 
15 biology, John Wiley and sons) in any suitable plasmids or 

cosmids, such as, for example, SuperCosl (Stratagene, La Jolla, 
USA) . 

Example 2: 

20 

Cloning of the gene for PRPP synthetase from Ashbya gossypii 
ATCC10895 (AgKPRl) 



if! Cloning of the gene for PRPP synthetase from Ashbya gossypii 

25 ( A 9 KPR1 ) can take place in two steps. In the first step, it is 

s possible with the following oligonucleotides to amplify a defined 

Q region of the KPR1 gene from genomic DNA from Ashbya gossypii by 

fU PCR: 

y 30 KPR5: 5'- GATGCTAGAGACCGCGGGGTGCAAC -3' 

fy KPR3: 5'- TGTCCGCCATGTCGTCTACAATAATA -3 ' 

The PCR can be carried out by a conventional method. The 
resulting 330 bp DNA fragment can be cloned by conventional 
methods into the vector pGEMT (Promega, Madison, USA) and be 
sequenced. 



35 



A genomic cosmid gene bank can be screened by conventional 
40 methods using this nucleotide sequence as probe. A 1911 bp 

Pstl-Hindlll fragment of a cosmid which gives a signal with this 
probe can then be subcloned into the vector pBluescript SK+ 
(Stratagene, La Jolla, USA). The KPR1 gene and incomplete ORFs 
which show homology with the UBC6 and UBP9 genes of Saccharomyces 
45 cerevisiae are located on this fragment. 



V 
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The PRPP synthetase KPR2 and the putative PRPP synthetase KPR4 
from Saccharomyces cerevisiae are the enzymes which are most 
closely related, with similarities of 80.2% and 79.6% 
respectively, to the PRPP synthetase from Ashbya gossypii. The 
5 KPR2 and KPR4 genes from Saccharomyces cerevisiae have 67.6% and 
67.8%, respectively, similarity with the KPR1 gene from Ashbya 
gossypii. Other enzymes and genes from other organisms are 
distinctly more different from the KPR1 gene and from the PRPP 
synthetase from Ashbya gossypii. 

10 

The sequence comparisons can be carried out, for example, with 
the Clustal algorithm with the aid of the PAM250 weighting table 
or the Wilbur-Lipman DNA alignment algorithm (as implemented, for 
example, in the MegAlign 3.06 program package supplied by 
15 DNAstar). It is not possible with the oligonucleotide pair 

described to amplify the genes for the different PRPP synthetases 
from Saccharomyces cerevisiae. 

It is also possible to use the probe to find a further clone from 

20 

the gene bank. This second clone showed a gene which likewise 
ffS codes for a PRPP synthetase. This gene is called AgKPR2 and is 

distinctly different from AgKPRl. AgKPR2 shows 66% identity with 
AgKPRl at the amino acid level. The AgKPR2 gene (SEQ ID NO: 12) 
was compared with all proteins of the Swissprot database. The 
* 25 maximum similarity shown by this protein (88% identity and 95% 

M similarity) is with the KPR3 gene product from Saccharomyces 

yf cerevisiae. The gene product of the AgKPRl gene is responsible 

for the predominant part of the PRPP synthetase activity in 
Q Ashbya gossypii. Disruption of the AgKPRl gene of Ashbya gossypii 

jrfj (analogous to the disruption of other Ashbya genes as in the 

descriptions in Examples 6-8) results in a distinctly reduced 
enzyme activity: in place of 22 U/mg of protein now only 3 U/mg 
of protein. See Example 13 for the analysis. Examples 11, 13 and 
15 relate to the AgKPRl gene, but studies of these types can also 
35 be carried out with AgKPR2 . 

Example 3: 

Cloning of the gene for glutamine-PRPP amidotransferase from 
40 Ashbya gossypii ATCC10895 (AgADE4) 

The cloning of the gene for glutamine-PRPP amidotransferase from 
Ashbya gossypii (AgADE4) can take place in two steps. 
In the first step, it is possible with the following 
oligonucleotides to amplify a defined region of the AgADE 4 gene 
from genomic DNA of Ashbya gossypii by PCR: 
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ADE4A: 5'- ATATCTTGATGAAGACGTTCACCGT -3' 
ADE4B: 5'- GATAATGACGGCTTGGCCGGGAAGA -3' 

^ The PCR can be carried out by a conventional method. The 
resulting 360 bp DNA fragment can be cloned by conventional 
methods into the vector pGEMT (Promega, Madison, USA) and then be 
sequenced. 

1$ This sequence can be used as probe to screen a genomic cosmid 
gene bank by conventional methods. It is then possible to 
subclone a 5369 bp Hindlll fragment from a cosmid which gives a 
signal with this probe into the vector pBluescript SK+ 
(Stratagene, La Jolla, USA). The AgADE4 gene and the gene for the 

15 Ashbya homolog for the mitochondrial ABC transporter ATM1 from 
Saccharomyces cerevisiae and another open reading frame whose 
function is unknown are located on this fragment. 

The AgADE4 gene product ( glutamine-PRPP amidotransf erase) shows 
20 the most evident similarity with the ADE4 gene products from 
Saccharomyces cerevisiae and Saccharomyces kluyveri (81% and 
86.3% respectively). The corresponding genes show only 68.8% and 
72%, respectively, homology, however. The similarity with other 
glutamine-PRPP amidotransf erases is distinctly less (e.g. only 
25 27.5% similarity with the corresponding enzyme from Bacillus 
subtilis). The sequence comparisons pan be carried out as 
described in Example 2 . 

It is not possible with the described pair of oligonucleotides to 
30 amplify the ADE4 genes from Saccharomyces cerevisiae or 
Saccharomyces kluyveri . 

Example 4 : 

35 Cloning of the gene for inosine-monophosphate dehydrogenase from 
Ashbya gossypii ATCC10895 (AgGUAl) 

Cloning of the gene for inosine-monophosphate dehydrogenase from 
Ashbya gossypii (AgGUAl) can take place in two steps. 

40 

In the first step, it is possible with the following 
oligonucleotides to amplify a defined region of the AgGUAl gene 
from genomic DNA from Ashbya gossypii by PCR: 



45 



IMP5: 5'- GGCATCAACCTCGAGGAGGCGAACC -3 ' 
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IMP3: 5'- CAGACCGGCCTCGACCAGCATCGCC - 3' 



The PCR can be carried out by a conventional method. The 
resulting 230 bp DNA fragment can be cloned by conventional 
methods into the vector pGEMT (Promega, Madison, USA) and then be 
sequenced . 



This sequence can be used as probe to screen a genomic cosmid 
gene bank by conventional methods. A 3616 bp Apal fragment from a 
cosmid which gives a signal with this probe can be subloned into 
the vector pBluescript SK+ (Stratagene, La Jolla, USA). The 
coding region of the AgGUAl gene is 1569 bp long and is 
interrupted by a 161 bp-long intron. The intron boundaries (5' 
splice site AGGTATGT and 3' splice site CAG) can be verified by 
cloning and sequencing of AgGUAl c DNA. 



AgGUAl is the first gene decribed from Ashbya gossypii having an 
intron. 

20 

The AgGUAl gene product (IMP dehydrogenase) shows the most 
evident similarity with the 4 IMP dehydrogenases from 
Saccharomyces cerevisiae (similarities between 67% and 77.2%). 
The similarity with other IMP dehydrogenases is distinctly less. 
25 The sequence comparisons can be carried out as described in 

Example 2 . Ashbya gossypii appears to have only one gene for this 
enzyme. This can be shown by Southern blotting with genomic DNA 
from Ashbya gossypii using the abovementioned probe. 

30 The gene from Saccharomyces cerevisiae which codes for the IMP 
dehydrogenase (IMH3) which has most similarity with the AgGUAl 
gene product has a similarity of 7 0.2% with the AgGUAl gene. It 
is not possible with the described pair of oligonucleotides to 
amplify this gene from Saccharomyces cerevisiae. 

35 

Example 5 : 

Cloning of the gene for guanosine-monophosphate synthetase from 
Ashbya gossypii ATCC10895 (AgGUA2) 

40 

Cloning of the gene for guanosine-monophosphate synthetase from 
Ashbya gossypii (AgGUA2) can take place in two steps. 

In the first step f it is possible with the following 
oligonucleotides to amplify a defined region of the AgGUA2 gene 

45 

from genomic DNA from Ashbya gossypii by PCR: 



GUA2A: 5'- TGGACCGGGCGGTGTTCGAGTTGGG -3' 
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GUA2B: 5'- AGGCTGGATCCTGGCTGCCTCGCGC -3' 



The PCR can be carried out by a conventional method. The 
resulting 750 bp DNA fragment can be cloned by conventional 
5 methods into the vector pBluescript SK+ (Stratagene, La Jolla, 
USA) and then be sequenced. 

This sequence can be used as probe to screen a genomic cosmid 
gene bank by conventional methods. A 2697 bp Clal-EcoRV fragment 
from a cosmid which gives a signal with this probe can then be 
subcloned into the vector pBluescript SK+ (Stratagene, La Jolla, 
USA) . 

15 The AgGUA2 gene product (GMP synthetase) shows the most evident 
similarity with GMP synthetase from Saccharomyces cerevisiae 
(similarity 86.6%). The genes for the GMP synthetases from 
Saccharomyces cerevisiae and Ashbya gossypii show 71.2% homology. 
The similarity of the AgGUA2 gene product with other GMP 

20 synthetases is distinctly less. The sequence comparisons can be 
carried out as described in Example 2 . 

It is not possible with the described pair of oligonucleotides to 
amplify the GMP synthetase gene from Saccharomyces cerevisiae. 

25 

Example 6 : 

Disruption of the AgADE4 gene from Ashbya gossypii ATCC10895 

30 Disruption of a gene means destroying the functionality of a 

genomic copy of the gene either by (a) deleting part of the gene 
sequence, or by (b) interrupting the gene by inserting a piece of 
foreign DNA into the gene or by (c) replacing part of the gene by 
foreign DNA. Any foreign DNA can be used, but it is preferably a 

35 gene which brings about resistance to any suitable chemical. Any 
suitable resistance genes can be used for disruption of genes. 

A gene which confers resistance to G418 can be used to disrupt 
the AgADE4 gene from Ashbya gossypii ATCC10895. It is possible 

40 for this to be the kanamycin resistance gene from TN903 under the 
control of the TEF promoter of Ashbya gossypii (see, for example, 
Yeast 10, pages 1793-1808, 1994, WO9200379). The gene is flanked 
5' and 3' by several cleavage sites for restriction 
endonucleases, thus constructing a cassette which allows any 

45 desired constructions of gene disruptions by conventional methods 
of in vitro manipulation of DNA. 
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The internal Hindi fragment of AgADE4 (between positions 2366 
and 2924) can be replaced by a resistance cassette as outlined 
above. The resulting construct is called ade4::G418. 

5 The resulting plasmid can be replicated in E.coli. The BamHI / 
Bglll fragment of the construct ade4::G418 can be prepared, 
purified by agarose gel electrophoresis and subsequent elution of 
the DNA from the gel (see Proc. Natl. Acad. Sci. USA 76 , 615-619, 
^ 1979) and employed for transforming Ashbya gossypii. 

Ashbya gossypii can be transformed by protoplast transformation 
(Gene 109, 99-105, 1991), but preferably by electroporation 
(BioRad Gene Pulser, conditions: cuvettes with slit widths 
15 0.4 mm, 1500V, 25^F, 100Q) . Transformed cells are selected from 
G418-containing solid medium. 

Resulting G418-resistant clones can be examined by conventional 
methods of PCR and Southern blot analysis to find whether the 
20 genomic copy of the AgADE4 gene is in fact destroyed. Clones 
whose AgADE4 gene is destroyed are purine-auxotrophic. 

Example 7 : 

25 Disruption of the A 9 GUAl gene from Ashbya gossypii ATCC10895 

See Example 6 for a description of the principle of disruption of 
genes, the use of a resistance cassette and the transformation of 
Ashbya gossypii. 

30 

The internal Xhol / Kpnl fragment of AgGUAl (between positions 
1620 and 2061) can be replaced by a resistance cassette as 
outlined above. The resulting construct is called gual::G418. 

35 The resulting plasmid can be replicated in E.coli. The Xbal / 
BamHI fragment of the construct gual::G418 can be prepared, 
purified by agarose gel electrophoresis and subsequent elution of 
the DNA from the gel and employed for transforming Ashbya 
gossypii. 



40 



45 



Resulting G418-resistant clones can be examined by conventional 
methods of PCR and Southern blot analysis to find whether the 
genomic copy of the AgGUAl gene is in fact destroyed. Clones 
whose AgGUAl gene is destroyed are guanine-auxotrophic. 
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Example 8 : 

Disruption of the AgGUA2 gene from Ashbya gossypii ATCC10895 



See Example 6 for a description of the principle of disruption of 
genes, the use of a resistance cassette and the transformation of 
Ashbya gossypii. 

The internal Sail fragment of AgGUA2 (between positions 1153 and 
10 1219) can be replaced by a resistance cassette as outlined above. 
The resulting construct is called gua2::G418. 

The resulting plasmid can be replicated in E.coli. The Xbal / 
BamHI fragment of the construct gua2::G418 can be prepared, 
15 purified by agarose gel electrophoresis and subsequent elution of 
the DNA from the gel and employed for transforming Ashbya 
gossypii. 



20 



25 



Resulting G418-resistant clones can be examined by conventional 
methods of PCR and Southern blot analysis to find whether the 
genomic copy of the AgGUA2 gene is in fact destroyed. Clones 
whose AgGUA2 gene is destroyed are guanine-auxotrophic. 

Example 9 : 

Cloning of the GAP promoter from Ashbya gossypii 



The gene for glyceraldehyde-3-phosphate dehydrogenase from Ashbya 
gossypii (AgGAP) can be cloned by generally customary screening 
30 of a genomic Ashbya gossypii cosmid gene bank (see Example l f 
with a probe which was constructed from information on the 
sequence of the GAP gene from Saccharomyces cerevisiae). 



35 



40 



45 



The 5 r nontranslated region of the gene (-373 to -8 region 
relative to the translation start) was assumed to be promoter. 
2 cleavage sites for the restriction endonuclease NotI were 
inserted flanking this sequence. In this region there are the 
bona fide TATA Box (nt 224-230) f two sequence sections (nt 43-51 
and 77-85) which correspond to the GCR1 binding element, and a 
sequence section (nt 9-20) whose complement partially corresponds 
to the RAP1 binding element of Saccharomyces cerevisiae (see, for 
example, Johnston, M. and Carlson, M. (1992) pp. 193-281 in The 
molecular biology and cellular biology of the yeast 
Saccharomyces: Gene expression, Cold Spring Harbor Laboratory 
Press). The promoter cassette constructed in this way can be 
placed as easily portable expression signal in front of any 
desired gene for overexpression in Ashbya gossypii and results in 
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pronounced overexpression of genes in Ashbya gossypii, as shown 
in Example 11. 

Example 10: 

5 

Construction of plasmids having genes under the control of the 
GAP promoter from Ashbya gossypii 

In order to introduce the GAP promoter cassette 5' of the coding 
10 region of the AgADE4 gene, a unique NotI cleavage site 

(recognition sequence GCGGCCGC) was inserted by conventional 
methods (e.g. Glover, D.M. and Hames, B.D. (1995) DNA cloning 
Vol.1, IRL press) 8 bp 5' of the ATG start codon. 

15 The GAP promoter cassette can then be inserted via NotI into this 
position. An analogous procedure can be used for cloning the GAP 
promoter cassette 5' of the coding region of the genes AgKPRl, 
AgGUAl, AgGUA2 and for variants of the genes AgADE4, AgKPRl, 
AgGUAl and AgGUA2 . 

20 

Expression of the genes which harbor the GAP promoter cassette 5' 
of the coding region in Ashbya gossypii is controlled by the GAP 
promoter . 

25 

Example 1 1 : 

Overexpression of genes in Ashbya gossypii under the control of 
the GAP promoter 

30 Transformation of Ashbya gossypii with the DNA constructs 
described in Example 10 can be carried out as described in 
Example 6. The recipient clones can preferably, but not 
exclusively, be those which, before the transformation to be 
carried out here, harbor a disruption of the gene to be 

35 overexpressed. Thus, for example, the Ashbya gossypii mutant 
which is described in Example 6 and harbors an ade4::G418 
mutation can be transformed with a GAP-ADE4 construct described 
in Example 10. Integration of the construct into the genome can 
be verified by Southern blot analysis. The resulting clones no 

40 longer have a G418 resistance gene (and are thus G418-sensitive) 
and are purine-prototrophic. Overexpression can be demonstrated 
by Northern blot analysis or detection of the enzymatic activity 
(as described in Example 12). On expression of the AgADE4 gene 
under the natural promoter, 0.007 U/mg of protein can be 

45 detected. On expression of the AgADE4 gene under the GAP 
promoter, 0.382 U/mg of protein can be detected. 
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A sequence section of the coding region of the AgADE4 gene can be 
used as probe. An analogous procedure can be used with AgKPRl, 
AgGUAl, AgGUA2 and for variants of all these genes. In addition, 
combinations of one of these genes together with other genes can 
5 be introduced in this way into the genome of Ashbya gossypii. 

The wild type Ashbya gossypii has a specific PRPP synthetase 
activity of 22 U/mg of protein (see Example 13 for analysis of 
the PRPP synthetase). On expression of the AgKPRl gene with the 
10 GAP promoter, 855 U/mg of protein is detectable. 

Example 12: 

Variants of the AgADE4 gene product (glutamine-PRPP 

amidotransf erase) no longer subject to feedback inhibition by 

purines or intermediates of purine synthesis. 
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Glutamine-PRPP amidotransf erases are subject to feedback 
inhibition by purine nucleotides. This inhibition is found in 
20 numerous organisms (see, for example, Switzer, R.L. (1989) 
Regulation of bacterial Glutamine Phosphoribosylpyrophosphate 
Amidotransf erase, in: Allosteric enzymes pp. 129-151, CRC press r 
Boca Raton) . 

25 The glutamine-PRPP amidotransf erase from Ashbya gossypii is 
likewise inhibited by AMP or GMP (see Figure). The activity of 
glutamine-phosphoribosyl-pyrophosphate amidotransf erase from 
Ashbya gossypii can be measured as described in Messenger and 
Zalkin (1979) J. Biol. Chem. 254, pages 3382-3392. 

30 

Modified glutamine-phosphoribosyl-pyrophosphate amidotransf erases 
no longer inhibited by purines can be constructed. It is evident 
that overexpression of such deregulated enzymes will enhance 
purine metabolism distinctly more than overexpression of enzymes 

35 

subject to feedback inhibition. Alterations in the sequence of 
the AgADE4 gene can be brought about by conventional methods 
(e.g. Glover, D.M. and Hames, B.D. (1995) DNA cloning Vol.1, IRL 
press). It is possible, for example, for the following amino 
acids in glutamine-phosphoribosyl-pyrophosphate amidotransferase 
to be replaced: 

The codon which codes for aspartate at position 310 can be 
replaced by a codon which codes for valine. The codon which codes 
45 for lysine at position 333 can be replaced by a codon which codes 
for alanine. The codon which codes for alanine at position 417 
can be replaced by a codon which codes for tryptophan. It is 



40 
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additionally possible to construct AgADE4 genes which harbor 
combinations of these substitutions. 

All enzymes which carry D310V, K333A, A417W or any combination of 
5 substitutions which comprise D310V or K333A show diminished 
feedback inhibition by AMP and GMP (see Figure). This can be 
shown, for example, by expressing the enzymes in Ashbya gossypii 
(see Example 11) . 

10 

Example 13: 

Variants of the AgKPRl gene product (PRPP synthetase) no longer 
subject to feedback inhibition by purines or intermediates of 
purine synthesis. 

15 

PRPP synthetases are subject to feedback inhibition by purines, 
pyrimidines and amino acids. This inhibition is found in numerous 
organisms (see, for example, Gibson, K.J. et al. (1982) J. Biol. 
Chem. 257, 2391-2396; Tatibana, M. et al. (1995) Adv., Enzyme 
20 Regul. 35, 229-249 and papers quoted therein). 

In clinical medical research there are descriptions of cases of 
hereditary gout based on enhanced purine biosynthesis. The 
molecular cause thereof is what is called superactivity of human 

25 PRPP synthetase (see, for example, Amer. J. Med. 55 (1973) 
232-242; J. Clin. Invest. 96 (1995) 2133-2141; J. Biol. 268 
(1993) 26476-26481). The basis thereof may be a mutation which 
leads to the enzyme no longer being subject to feedback 
inhibition by purines. 

30 The activity of the PRPP synthetase from Ashbya gossypii can be 
measured as described in Anal. Biochem. 98 (1979) 254-263 or J. 
Bacterid. 174 (1992) 6852-6856. The specific activity (U/mg) is 
defined via the amount of resulting product (nmol/min/g of 
protein) . 

35 It is possible to construct modified PRPP synthetases no longer 
inhibited by purines. It is evident that overexpression of such 
deregulated enzymes enhances purine metabolism distinctly more 
than does overexpression of enzymes subject to feedback 
inhibition. Modifications of the sequence of the AgKPRl gene may 

40 be brought about by conventional methods (e.g. Glover, D.M. and 
Hames, B.D. (1995) DNA cloning Vol. 1, IRL press). It is 
possible, for example, to exchange the following amino acids of 
the PRPP synthetase: 

The codon which codes for leucine at position 131 can be replaced 
45 by a codon which codes for isoleucine. The codon which codes for 
histidine at position 196 can be replaced by a codon which codes 
for glutamine. 
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All enzymes which have one of these amino acid exchanges (LI 3 II 
or H196Q) show a reduced feedback inhibition by purines. Figure 2 
shows this by the example of ADP. 

This can be shown after expression of the corresponding enzymes 
in Ashbya gossypii. This can be carried out in accordance with 
Example 11, 

Example 14: 

Variants of the AgGUAl gene product (IMP dehydrogenase) no longer 
subject to feedback inhibition by purines or intermediates of 
purine synthesis. 



Example 15: 

15 Effects of the enhancement and/or optimization of enzymes of 
purine metabolism and their genes on riboflavin production in 
Ashbya gossypii 



20 



25 



The original strain Ashbya gossypii ATCC10895 can be tested for 
riboflavin productivity in shaken flasks, comparing with clones 
which are derived therefrom and harbor chromosomal copies of 
genes under the control of the GAP promoter (as described in 
Example 11). It is possible to use for this purpose 300 ml shaken 
flasks with 20 ml of YPD medium (Sambrook, J. et al. (1989) 
Molecular cloning: a laboratory manual, Cold Spring Harbor 
Laboratory Press), incubating at a temperature of 28°C. 



After 2 days, the control strain produces on average 14.5 mg of 
30 riboflavin P er 1 of culture broth. Strains which overexpress 

genes for enzymes of purine metabolism (as shown, for example, in 
Example 11), or overexpress genes for optimized enzymes of purine 
metabolism (for example as in Examples 12, 13 and 14), produce 
more riboflavin. Thus, the strain which overexpresses 
35 AgADE4D310VK333A (Example 12) produces on average 45.4 mg of 
riboflavin per 1 of culture broth in 2 days. 

The strain which overexpresses AgKPRl with the GAP promoter 
produces not 14 mj/1 (like the WT) but 36 mg/1 riboflavin. The 
40 strain wn ich overexpresses AgKPRlH196Q with the GAP promoter 
produces 51 mg/1 riboflavin. 

Figure 1: 

Measurement of the activity of Gln-PRPP amidotransf erase from A. 
45 gossypii and of modified forms of the enzyme as a function of the 
concentration of adenosine 5 ' -monophosphate (AMP) and guanosine 
5 ' -monophosphate { GMP ) . 
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WT: Gln-PRPP amidotransf erase 

A417W: Gln-PRPP amidotransf erase , alanine at position 417 
replaced by tryptophan. 

K333A: Gln-PRPP amidotransf erase, lysine at position 333 replaced 
by alanine. 

D310VK333A: Gln-PRPP amidotransf erase , aspartate at position 310 
replaced by valine and lysine at position 333 replaced by 
alanine. 

10 

Figure 2: 

Measurement of the activity of the PRPP synthetase from A. 
gossypii and of modified forms of the enzyme as a function of the 
concentration of adenosine 5 '-diphosphate (ADP) 
15 WT: PRPP synthetase 

L131I: PRPP synthetase, leucine at position 131 replaced by 
isoleucine 

H196Q: PRPP synthetase , histidine at position 196 replaced by 
glutamine 

20 H196Q, L131I: PRPP synthetase, histidine at position 196 replaced 
by glutamine and leucine at position 131 replaced by isoleucine 



25 



30 



35 



40 



45 
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SEQUENCE LISTING 

GENERAL INFORMATION : 

(i) APPLICANT: 

(A) NAME: BASF Aktiengesellschaf t 

(B) STREET: Carl-Bosch-Strasse 38 

(C) CITY: Ludwigshafen 

(E) COUNTRY: Federal Republic of Germany 

(F) POSTAL CODE: D-67056 

(G) TELEPHONE: 0621/6048526 

(H) TELEFAX: 0621/6043123 

(I) TELEX: 1762175170 



(ii) TITLE OF APPLICATION: Genes of purine biosynthesis from Ashbya 
gossypii and their use in microbial riboflavin biosynthesis 



(iii) NUMBER OF SEQUENCES: 13 



(iv) COMPUTER- READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS— DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 (EPO) 



INFORMATION FOR SEQ ID NO: 1: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1911 base pairs 

(B) TYPE: Nucleic acid 

(C) STRANDEDNESS : single 

( D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(iii) HYPOTHETICAL: NO 



(iv) ANTI SENSE: NO 



(ix) FEATURES: 

(A) NAME/KEY: 5'UTR 

(B) LOCATION: 1..625 



(ix) FEATURES: 

(A) NAME /KEY: CDS 

(B) LOCATION: 626.. 1582 



(ix) 



FEATURES: 

(A) NAME/KEY: 3'UTR 



n i 
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(B) LOCATION: 1583.. 1911 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

GGTAGTCGCT CATCGACAGA CACAATCGCG TGTTCTCTCT GAATCGTCCA TTGGGTGTCA 60 

GCATCCTGAT CGCGGGCGGA TGGAATGGGT AATCATTAGG AAACACCAAT GTCCCATGGT 120 

ATTGTCCGTC CTCGTATGGT GTCTCAGGAG GACCCGTGAT CACGTAGTGC CACACCAGGA 180 

TATTGTCTTC CTTTGGTGCT GCCACGATGT AGGGCGGGGG GTTCTCGGTC ATCATTTTGT 240 

ACTCCTTTGA GAGCCGCTTG TACGCCTGTC TTGATGCCAT CTTGCCTACT ATTAGTTTCT 300 

CACCACTTCC CGCCAAACAA TCTGCACTTT ACGAGCGCTA TCTATCCCTC GGGTCGCTCT 360 

AGTTGATTAT TGGCGAAACT GATAGTTCAG GTACTTCCAT GATGCGGTCA TATCCACGTA 420 

TGTGATCACG TGATCATCAG CCATGCTGCC AGCTCACGGG CCTGCCTACA CTATTGGAGG 480 

CTCTGTGAGT CATGATTTAT TGCATATCAA GCCCAGATAG TCGTTGGGGA TACTACCGTT 540 

GCCGCGATGA GCTCCGATAT TAAGTTGTAG CCAAAAATTT TAACGGATGA CTTCTTAACA 600 



jfl GTTATTGACG CCGCAATCCT ACGCC ATG TCG TCC AAT AGC ATA AAG CTG CTA 652 

%l Met Ser Ser Asn Ser He Lys Leu Leu 

L 1 5 



!V GCA GGT TCG CAC CCG GAC CTA GCT GAG AAG GTC TCC GTT CGC CTA 700 

Ala G1 Y Asn Ser His Pro Asp Leu Ala Glu Lys Val Ser Val Arg Leu 
S3 10 15 20 ^ 25 



GGT GTA CCA CTT TCG AAG ATT GGA GTG TAT CAC TAC TCT AAC AAA GAG 748 
Gly Val Pro Leu Ser Lys He Gly Val Tyr His Tyr Ser Asn Lys Glu 
30 35 40 

ACG TCA GTT ACT ATC GGC GAA AGT ATC CGT GAT GAA GAT GTC TAC ATC 796 
Thr Ser Val Thr He Gly Glu Ser He Arg Asp Glu Asp Val Tyr He 
45 50 55 

ATC CAG ACA GGA ACG GGG GAG CAG GAA ATC AAC GAC TTC CTC ATG GAA 844 
lie Gin Thr Gly Thr Gly Glu Gin Glu He Asn Asp Phe Leu Met Glu 
60 65 70 

CTG CTC ATC ATG ATC CAT GCC TGC CGG TCA GCC TCT GCG CGG AAG ATC 892 
Leu Leu He Met He His Ala Cys Arg Ser Ala Ser Ala Arg Lys He 
75 80 85 



\ 
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ACA GCG GTT ATA CCA AAC TTC CCT TAC GCA AGA CAA GAC AAA AAG GAC 940 
Thr Ala Val lie Pro Asn Phe Pro Tyr Ala Arg Gin Asp Lys Lys Asp 
90 95 100 105 

AAG TCG CGA GCA CCG ATA ACT GCC AAG CTG GTG GCC AAG ATG CTA GAG 988 
Lys Ser Arg Ala Pro He Thr Ala Lys Leu Val Ala Lys Met Leu Glu 
110 115 120 

ACC GCG GGG TGC AAC CAC GTT ATC ACG ATG GAT TTG CAC GCG TCT CAA 1036 
Thr Ala Gly Cys Asn His Val He Thr Met Asp Leu His Ala Ser Gin 
125 130 135 

ATT CAG GGT TTC TTC CAC ATT CCA GTG GAC AAC CTA TAT GCA GAG CCG 1084 
He Gin Gly Phe Phe His He Pro Val Asp Asn Leu Tyr Ala Glu Pro 
140 145 150 

AAC ATC CTG CAC TAC ATC CAA CAT AAT GTG GAC TTC CAG AAT AGT ATG 1132 
Asn He Leu His Tyr lie Gin His Asn Val Asp Phe Gin Asn Ser Met 
155 160 165 

TTG GTC GCG CCA GAC GCG GGG TCG GCG AAG CGC ACG TCG ACG CTT TCG 1180 
Leu Val Ala Pro Asp Ala Gly Ser Ala Lys Arg Thr Ser Thr Leu Ser 
170 175 180 185 



GAC AAG CTG AAT CTC AAC TTC GCG TTG ATC CAC AAA GAA CGG CAG AAG 1228 
Sj Asp Lys Leu Asn Leu Asn Phe Ala Leu lie His Lys Glu Arg Gin Lys 

190 195 200 



GCG AAC GAG GTC TCG CGG ATG GTG TTG GTG GGT GAT GTC GCC GAC AAG 1276 
Ala Asn Glu Val Ser Arg Met Val Leu Val Gly Asp Val Ala Asp Lys 
205 210 215 

TCC TGT ATT ATT GTA GAC GAC ATG GCG GAC ACG TGC GGA ACG CTA GTG 1324 
Ser Cys He He Val Asp Asp Met Ala Asp Thr Cys Gly Thr Leu Val 
220 225 230 

AAG GCC ACT GAC ACG CTG ATC GAA AAT TGT GCG AAA GAA GTG ATT GCC 1372 
Lys Ala Thr Asp Thr Leu lie Glu Asn Cys Ala Lys Glu Val He Ala 
235 240 245 

ATT GTG ACA CAC GGT ATA TTT TCT GGC GGC GCC CGC GAG AAG TTG CGC 1420 
He Val Thr His Gly lie Phe Ser Gly Gly Ala Arg Glu Lys Leu Arg 
250 255 260 265 

AAC AGC AAG CTG GCA CGG ATC GTA AGC ACA AAT ACG GTG CCA GTG GAC 14 68 

Asn Ser Lys Leu Ala Arg lie Val Ser Thr Asn Thr Val Pro Val Asp 
270 275 280 
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CTC AAT CTA GAT ATC TAC CAC CAA ATT GAC ATT AGT GCC ATT TTG GCC .1516 
Leu Asn Leu Asp He Tyr His Gin He Asp He Ser Ala He Leu Ala 
285 290 295 



GAG GCA ATT AGA AGG CTT CAC AAC GGG GAA AGT GTG TCG TAC CTG TTC 1564 
Glu Ala He Arg Arg Leu His Asn Gly Glu Ser Val Ser Tyr Leu Phe 
300 305 310 

AAT AAC GCT GTC ATG TAGTGCTGTC AGTGGCAGAT GCATGATCGC TGGCCTAATT 1619 
Asn Asn Ala Val Met 
315 

ATCTGTGTAA GTTGATACAA TGCAGTAAAT ACAGTACATA AAACTGAATG TTTTTCACTT 1679 



AGGGGTGCTT TGTTGTTCTG ATAGCGTGTG TGCGAATTTG GAGGTGAAAG TTGAACATCA 1739 
CGTAATGAAT ACAAACAAGA TTGCACATTA GGAAAAGCGA TAAATTATTT ATTATTTGCA 1799 



Ms ACTGGCCTTT GAGCGTTTAA GCCTGAACAT TTTTGCCCTT TTGTTTGACC GTACCGTTAT 1859 

Q. CACTCGTCCT TATATATGGC TATCCTTCTC TTCCGGAACT TCTTCGAGCG TA 1911 



M" (2) INFORMATION FOR SEQ ID NO: 2: 



vi 

o 



m 
o 
ru 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 318 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ser Ser Asn Ser lie Lys Leu Leu Ala Gly Asn Ser His Pro Asp 
15 10 15 



Leu Ala Glu Lys Val Ser Val Arg Leu Gly Val Pro Leu Ser Lys lie 
20 25 30 

Gly Val Tyr His Tyr Ser Asn Lys Glu Thr Ser Val Thr He Gly Glu 
35 40 45 

Ser He Arg Asp Glu Asp Val Tyr He He Gin Thr Gly Thr Gly Glu 
50 55 60 

Gin Glu He Asn Asp Phe Leu Met Glu Leu Leu He Met He His Ala 
65 70 75 80 
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Cys Arg Ser Ala Ser Ala Arg Lys lie Thr Ala Val lie Pro Asn Phe 
85 90 95 

Pro Tyr Ala Arg Gin Asp Lys Lys Asp Lys Ser Arg Ala Pro lie Thr 
100 105 110 

Ala Lys Leu Val Ala Lys Met Leu Glu Thr Ala Gly Cys Asn His Val 
115 120 125 

lie Thr Met Asp Leu His Ala Ser Gin lie Gin Gly Phe Phe His lie 
130 135 140 

Pro Val Asp Asn Leu Tyr Ala Glu Pro Asn lie Leu His Tyr lie Gin 
145 150 155 160 

His Asn Val Asp Phe Gin Asn Ser Met Leu Val Ala Pro Asp Ala Gly 
165 170 175 

Ser Ala Lys Arg Thr Ser Thr Leu Ser Asp Lys Leu Asn Leu Asn Phe 
180 185 190 

Ala Leu lie His Lys Glu Arg Gin Lys Ala Asn Glu Val Ser Arg Met 
195 200 205 

Val Leu Val Gly Asp Val Ala Asp Lys Ser Cys lie lie Val Asp Asp 
210 215 220 

Met Ala Asp Thr Cys Gly Thr Leu Val Lys Ala Thr Asp Thr Leu lie 
225 230 235 240 

Glu Asn Cys Ala Lys Glu Val lie Ala He Val Thr His Gly He Phe 
245 250 255 

Ser Gly Gly Ala Arg Glu Lys Leu Arg Asn Ser Lys Leu Ala Arg He 
260 265 270 

Val Ser Thr Asn Thr Val Pro Val Asp Leu Asn Leu Asp He Tyr His 
275 280 285 

Gin He Asp He Ser Ala He Leu Ala Glu Ala He Arg Arg Leu His 
290 295 300 

Asn Gly Glu Ser Val Ser Tyr Leu Phe Asn Asn Ala Val Met 
305 310 315 

(2) INFORMATION FOR SEQ ID NO: 3: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5369 base pairs 
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(B) TYPE: Nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(ix) FEATURES: 

(A) NAME /KEY: 5'UTR 

(B) LOCATION: 1..54 

(ix) FEATURES: 

(A) NAME /KEY: CDS 

(B) LOCATION: 55.. 1482 

(ix) FEATURES: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1767.. 3299 

(ix) FEATURES: 

(A) NAME /KEY: CDS 

(B) LOCATION: 3588.. 4703 

(ix) FEATURES: 

(A) NAME /KEY: 3'UTR 

(B) LOCATION: 4704.. 5369 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

AAGCTTGACC TTGGCTGGCA CTTGAGTCGG CAGACAGGTG GACTAACCCG AGCA ATG 

Met 
1 

GAT CGT GGT TGT AAA GGT ATC TCT TAT GTG CTC AGT GCA ATG GTT TTT 
Asp Arg Gly Cys Lys Gly lie Ser Tyr Val Leu Ser Ala Met Val Phe 
5 10 15 



CAC ATA ATA CCG ATT ACA TTT GAA ATA TCG ATG GTA TGT GGC ATA TTG 
His lie lie Pro lie Thr Phe Glu lie Ser Met Val Cys Gly lie Leu 
20 25 30 
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ACA TAC CAG TTT GGT GCT TCC TTC GCT GCT ATA ACA TTC TCG ACT ATG 
Thr Tyr Gin Phe Gly Ala Ser Phe Ala Ala lie Thr Phe Ser Thr Met 
35 40 45 



201 



CTT CTT TAC TCC ATC TTT ACT TTC AGA ACG ACG GCG TGG CGC ACA CGG 
Leu Leu Tyr Ser lie Phe Thr Phe Arg Thr Thr Ala Trp Arg Thr Arg 
50 55 60 65 



249 



TTT AGG CGT GAT GCG AAC AAG GCT GAC AAT AAG GCC GCT AGT GTG GCA 
Phe Arg Arg Asp Ala Asn Lys Ala Asp Asn Lys Ala Ala Ser Val Ala 
70 75 80 



297 



TTG GAT TCC CTA ATA AAT TTT GAA GCT GTA AAG TAT TTC AAT AAC GAG 
Leu Asp Ser Leu lie Asn Phe Glu Ala Val Lys Tyr Phe Asn Asn Glu 
85 90 95 



345 



AAG TAC CTT GCG GAC AAG TAT CAC ACA TCC TTG ATG AAG TAC CGG GAT 393 
Lys Tyr Leu Ala Asp Lys Tyr His Thr Ser Leu Met Lys Tyr Arg Asp 
100 105 110 

TCC CAG ATA AAG GTC TCG CAA TCG CTG GCG TTT TTG AAC ACC GGC CAG 441 
Ser Gin lie Lys Val Ser Gin Ser Leu Ala Phe Leu Asn Thr Gly Gin 
115 120 125 
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AAC CTA ATT TTT ACC ACT GCA CTG ACT GCA ATG ATG TAT ATG GCC TGT 489 
Asn Leu lie Phe Thr Thr Ala Leu Thr Ala Met Met Tyr Met Ala Cys 
130 135 140 145 

AAT GGT GTT ATG CAG GGC TCT CTT ACA GTG GGG GAT CTT GTG TTA ATT 537 
Asn Gly Val Met Gin Gly Ser Leu Thr Val Gly Asp Leu Val Leu lie 
150 155 160 



AAT CAA CTG GTA TTC CAG CTC TCC GTG CCA CTA AAC TTC CTT GGT AGC 
Asn Gin Leu Val Phe Gin Leu Ser Val Pro Leu Asn Phe Leu Gly Ser 
165 170 175 



585 



GTC TAC CGT GAT CTC AAG CAG TCT CTG ATA GAT ATG GAA TCT TTA TTT 
Val Tyr Arg Asp Leu Lys Gin Ser Leu lie Asp Met Glu Ser Leu Phe 
180 185 190 



633 



AAA CTG CAA AAA AAT CAG GTC ACA ATT AAG AAC TCC CCA AAT GCC CAG 
Lys Leu Gin Lys Asn Gin Val Thr lie Lys Asn Ser Pro Asn Ala Gin 
195 200 205 



681 



AAC CTA CCA ATA CAC AAA CCG TTG GAT ATT CGC TTT GAA AAT GTT ACG 
Asn Leu Pro lie His Lys Pro Leu Asp lie Arg Phe Glu Asn Val Thr 
210 215 220 225 



729 
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TTT GGC TAT GAC CCG GAG CGG CGT ATA TTG AAC AAT GTT TCG TTT ACC 77 7 
Phe Gly Tyr Asp Pro Glu Arg Arg lie Leu Asn Asn Val Ser Phe Thr 

230 235 240 

ATC CCA GCT GGA ATG AAG ACT GCC ATA GTA GGC CCA TCG GGC TCG GGG 825 

He Pro Ala Gly Met Lys Thr Ala He Val Gly Pro Ser Gly Ser Gly 

245 250 255 

AAG TCC ACC ATT TTG AAG CTC GTA TTT AGA TTC TAT GAG CCC GAG CAA 873 

Lys Ser Thr He Leu Lys Leu Val Phe Arg Phe Tyr Glu Pro Glu Gin 

260 265 270 

GGT CGT ATC CTA GTT GGC GGC ACA GAT ATC CGC GAT TTA GAC TTG CTT 921 

Gly Arg He Leu Val Gly Gly Thr Asp lie Arg Asp Leu Asp Leu Leu 

275 280 285 

TCT TTA CGG AAG GCT ATC GGT GTC GTG CCC CAA GAT ACT CCT CTC TTC 969 

Ser Leu Arg Lys Ala He Gly Val Val Pro Gin Asp Thr Pro Leu Phe 

290 295 300 305 

J* AAT GAC ACA ATC TGG GAG AAT GTT AAA TTC GGC AAT ATC AGT TCC TCT 1017 

^ Asn Asp Thr He Trp Glu Asn Val Lys Phe Gly Asn He Ser Ser Ser 

■2 310 315 320 

cn 

GAC GAT GAG ATT CTC AGG GCC ATA GAA AAA GCT CAA CTC ACG AAG CTA 1065 

V i Asp Asp Glu lie Leu Arg Ala He Glu Lys Ala Gin Leu Thr Lys Leu 

^ 325 330 335 

si 

CTC CAG AAC CTA CCA AAG GGC GCT TCC ACC GTT GTA GGG GAG CGC GGT 1113 

Leu Gin Asn Leu Pro Lys Gly Ala Ser Thr Val Val Gly Glu Arg Gly 

In 340 345 350 



ru 



TTG ATG ATC AGC GGA GGT GAG AAA CAA AGG CTT GCT ATT GCT CGT GTG 1161 

Leu Met He Ser Gly Gly Glu Lys Gin Arg Leu Ala He Ala Arg Val 
355 360 365 

CTT TTG AAG GAC GCT CCG CTG ATG TTT TTC GAC GAG GCT ACA AGT GCT 1209 

Leu Leu Lys Asp Ala Pro Leu Met Phe Phe Asp Glu Ala Thr Ser Ala 

370 375 380 385 

CTG GAT ACA CAC ACA GAG CAG GCA CTC TTG CAC ACC ATT CAG CAG AAC 1257 

Leu Asp Thr His Thr Glu Gin Ala Leu Leu His Thr He Gin Gin Asn 
390 395 400 

TTT TCT TCC AAT TCA AAG ACG AGC GTT TAG GTT GCC CAT AGA CTG CGC 1305 

Phe Ser Ser Asn Ser Lys Thr Ser Val Tyr Val Ala His Arg Leu Arg 
405 410 415 
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ACA ATC GCT GAT GCA GAT AAG ATC ATT GTT CTT GAA CAA GGT TCT GTC 1353 
Thr lie Ala Asp Ala Asp Lys He He Val Leu Glu Gin Gly Ser Val 
420 425 430 

CGC GAA GAG GGC ACA CAC AGC TCG CTG TTA GCG TCA CAA GGA TCC CTA 1401 
Arg Glu Glu Gly Thr His Ser Ser Leu Leu Ala Ser Gin Gly Ser Leu 
435 440 445 

TAC CGG GGT CTG TGG GAT ATT CAG GAA AAC CTA ACG CTT CCG GAA CGG 1449 
Tyr Arg Gly Leu Trp Asp He Gin Glu Asn Leu Thr Leu Pro Glu Arg 
450 455 460 465 

CCT GAG CAG TCA ACC GGA TCT CAG CAT GCA TAGACGTCTG ACTAGAGATT 1499 
Pro Glu Gin Ser Thr Gly Ser Gin His Ala 
470 475 

ATATAATAAC CCTCGAGCCA AAATTATACG GCGCTAACAA GTAAAAATTT TAGTTACTTT 1559 

TCTGACTTCT CTACGCTGAC TTCTCTACCC TTCTAACATA GTTAATTGAA GTAGTGGTTA 1619 

ATGACGACTG CATTTTATTA TTGTCCACTT TGCATTAGAA GTACTAGTGC TTAAGCGCTC 1679 

TTTAGGCCGC TTTCTTCTTC TTTGTCAGGC CGCAAGGTAA AGGAAGCACC AACGGATTGC 1739 

TACCGCTGCT ATTCCTGCTC TCTCAAG ATG TGT GGC ATA TTA GGC GTT GTG 1790 

Met Cys Gly He Leu Gly Val Val 
1 5 

CTA GCC GAT CAG TCG AAG GTG GTC GCC CCT GAG TTG TTT GAT GGC TCA 1838 
Leu Ala Asp Gin Ser Lys Val Val Ala Pro Glu Leu Phe Asp Gly Ser 
10 15 20 

CTG TTC TTA CAG CAT CGC GGT CAA GAT GCT GCC GGG ATT GCT ACG TGC 1886 
Leu Phe Leu Gin His Arg Gly Gin Asp Ala Ala Gly He Ala Thr Cys 
25 30 35 40 

GGC CCC GGT GGG CGC TTG TAC CAA TGT AAG GGC AAT GGT ATG GCA CGG 1934 
Gly Pro Gly Gly Arg Leu Tyr Gin Cys Lys Gly Asn Gly Met Ala Arg 
45 50 55 

GAC GTG TTC ACG CAA GCT CGG ATG TCA GGG TTG GTT GGC TCT ATG GGG 1982 
Asp Val Phe Thr Gin Ala Arg Met Ser Gly Leu Val Gly Ser Met Gly 
60 65 70 

ATT GCA CAC CTG AGA TAT CCC ACT GCA GGC TCC AGT GCG AAC TCA GAA 2030 
He Ala His Leu Arg Tyr Pro Thr Ala Gly Ser Ser Ala Asn Ser Glu 
75 80 85 
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GCG CAG CCA TTC TAT GTG AAT AGT CCC TAC GGA ATT TGC ATG AGT CAT 2078 
Ala Gin Pro Phe Tyr Val Asn Ser Pro Tyr Gly lie Cys Met Ser His 
90 95 100 

AAT GGT AAT CTG GTG AAC ACG ATG TCT CTA CGT AGA TAT CTT GAT GAA 2126 
Asn Gly Asn Leu Val Asn Thr Met Ser Leu Arg Arg Tyr Leu Asp Glu 
105 110 115 120 

GAC GTT CAC CGT CAT ATT AAC ACG GAC AGC GAT TCT GAG CTA CTG CTT 2174 
Asp Val His Arg His lie Asn Thr Asp Ser Asp Ser Glu Leu Leu Leu 
125 130 135 

AAT ATA TTT GCC GCG GAG CTG GAA AAG TAC AAC AAA TAT CGT GTG AAC 2222 
Asn lie Phe Ala Ala Glu Leu Glu Lys Tyr Asn Lys Tyr Arg Val Asn 
140 145 150 

AAC GAT GAT ATA TTT TGT GCT CTA GAG GGT GTT TAC AAA CGT TGT CGC 2270 
Asn Asp Asp lie Phe Cys Ala Leu Glu Gly Val Tyr Lys Arg Cys Arg 
155 160 165 

GGT GGC TAT GCT TGT GTT GGC ATG TTG GCG GGA TAT GGA TTG TTT GGT 2318 

%J Gly Gly Tyr Ala Cys Val Gly Met Leu Ala Gly Tyr Gly Leu Phe Gly 

81 170 175 180 

Hi TTC CGG GAC CCC AAT GGG ATC AGG CCG CTA TTG TTT GGT GAG CGC GTC 2366 

Phe Arg Asp Pro Asn Gly lie Arg Pro Leu Leu Phe Gly Glu Arg Val 
« 185 190 195 200 

%n4 

flj AAC GAT GAC GGC ACC ATG GAC TAC ATG CTA GCG TCC GAA AGT GTC GTT 2414 

Asn Asp Asp Gly Thr Met Asp Tyr Met Leu Ala Ser Glu Ser Val Val 
Ul 205 210 215 

f!| CTT AAG GCC CAC CGC TTC CAA AAC ATA CGT GAT ATT CTT CCC GGC CAA 2462 

Leu Lys Ala His Arg Phe Gin Asn lie Arg Asp lie Leu Pro Gly Gin 
220 225 230 

GCC GTC ATT ATC CCT AAA ACG TGC GGC TCC AGT CCA CCA GAG TTC CGG 2510 
Ala Val lie lie Pro Lys Thr Cys Gly Ser Ser Pro Pro Glu Phe Arg 
235 240 245 

CAG GTA GTG CCA ATT GAG GCC TAC AAA CCG GAC TTG TTT GAG TAC GTG 2558 
Gin Val Val Pro lie Glu Ala Tyr Lys Pro Asp Leu Phe Glu Tyr Val 
250 255 260 

TAT TTC GCT CGT GCT GAC AGC GTT CTG GAC GGT ATT TCC GTT TAC CAT 2606 
Tyr Phe Ala Arg Ala Asp Ser Val Leu Asp Gly He Ser Val Tyr His 
265 270 275 280 
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ACA CGC CTG TTG ATG GGT ATC AAA CTT GCC GAG AAC ATC AAA AAA CAG 2654 
Thr Arg Leu Leu Met Gly lie Lys Leu Ala Glu Asn lie Lys Lys Gin 
285 290 295 

ATC GAT CTG GAC GAA ATT GAC GTT GTT GTA TCT GTT CCT GAC ACT GCA 27 02 

lie Asp Leu Asp Glu lie Asp Val Val Val Ser Val Pro Asp Thr Ala 
300 305 310 

CGT ACC TGT GCA TTG GAG TGT GCC AAC CAT TTA AAC AAA CCT TAT CGC 2750 
Arg Thr Cys Ala Leu Glu Cys Ala Asn His Leu Asn Lys Pro Tyr Arg 
315 320 325 

GAA GGA TTT GTC AAG AAC AGA TAT GTT GGA AGA ACA TTT ATC ATG CCA 2798 
Glu Gly Phe Val Lys Asn Arg Tyr Val Gly Arg Thr Phe lie Met Pro 
330 335 340 

AAC CAA AAA GAG CGA GTA TCT TCT GTG CGC CGC AAG TTG AAC CCA ATG 2846 
Asn Gin Lys Glu Arg Val Ser Ser Val Arg Arg Lys Leu Asn Pro Met 
U 345 350 355 360 

CI 

Q AAC TCA GAA TTT AAA GAC AAG CGC GTG CTG ATT GTC GAT GAT TCC ATT 2894 

Asn Ser Glu Phe Lys Asp Lys Arg Val Leu lie Val Asp Asp Ser lie 

ffl 365 370 375 

111 GTG CGA GGT ACC ACT TCC AAA GAG ATT GTT AAC ATG GCG AAG GAA TCC 2942 

%J Val Arg Gly Thr Thr Ser Lys Glu lie Val Asn Met Ala Lys Glu Ser 

b 380 385 390 

law? 

fU GGT GCT GCC AAG GTC TAC TTT GCC TCT GCA GCG CCA GCA ATT CGT TTC 2990 

Gly Ala Ala Lys Val Tyr Phe Ala Ser Ala Ala Pro Ala lie Arg Phe 
Ul 395 400 405 

o 

Hj AAT CAC ATC TAC GGG ATT GAC CTA GCA GAT ACT AAG CAG CTT GTC GCC 3038 

Asn His lie Tyr Gly lie Asp Leu Ala Asp Thr Lys Gin Leu Val Ala 
410 415 420 

TAC AAC AGA ACT GTT GAA GAA ATC ACT GCG GAG CTG GGC TGT GAC CGC 3086 
Tyr Asn Arg Thr Val Glu Glu lie Thr Ala Glu Leu Gly Cys Asp Arg 
425 430 435 440 

GTC ATC TAT CAA TCT TTG GAT GAC CTC ATC GAC TGT TGC AAG ACA GAC 3134 
Val lie Tyr Gin Ser Leu Asp Asp Leu lie Asp Cys Cys Lys Thr Asp 
445 450 455 

ATC ATC TCA GAA TTT GAA GTT GGA GTT TTC ACT GGT AAC TAC GTT ACA 3182 
He He Ser Glu Phe Glu Val Gly Val Phe Thr Gly Asn Tyr Val Thr 
460 465 470 
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GGT GTT GAG GAT GTG TAC TTG CAG GAA TTA GAA CGT TGC CGC GCT CTT 3230 
Gly Val Glu Asp Val Tyr Leu Gin Glu Leu Glu Arg Cys Arg Ala Leu 
475 480 485 

AAT AAC TCG AAT AAG GGT GAA GCG AAG GCC GAG GTT GAT ATT GGT CTC 327 8 

Asn Asn Ser Asn Lys Gly Glu Ala Lys Ala Glu Val Asp lie Gly Leu 
490 495 500 

TAC AAT TCT GCC GAC TAT TAGCGGCGCC GTTGCCGGCA TCCGGCCCCA 3326 
Tyr Asn Ser Ala Asp Tyr 
505 510 

TAT AT AG ACT CATCGGGACC TAAAATAAGC CTTTACAGAT CATTATCTAC AAATATAGAT 3386 

ACCATTAAAA GCCTGACTTT CGACTTACTC CTAGCACACC CCGTTGTATC CCTGTGCTTG 3446 

CTTTCTTAAA TGCCGTTGGT TAGGCTTTGG ACTTAGCGTC CCGCCCATTT TCTAGCATGT 3506 

H 3 GC AGATC TAG CAAATTTGGC CTAAGACAAG AAGATCCATT CGGCACCCAC ATCCTGGAGC 3566 

0 CAGCACACAG TGGACCCAGA C ATG AGC AGC GGC AAT ATA TGG AAG CAA TTG 3617 

Met Ser Ser Gly Asn lie Trp Lys Gin Leu 
^ 1 5 10 



CTA GAG GAG AAT AGC GAA CAG CTG GAC CAG TCC ACT ACG GAG ACT TAC 3665 
^ Leu Glu Glu Asn Ser Glu Gin Leu Asp Gin Ser Thr Thr Glu Thr Tyr 

L 15 20 25 



S y GTG GTA TGC TGC GAG AAC GAA GAT TCC CTT AAC CAG TTT TTG CAA CAA 3713 



ii 



Val Val Cys Cys Glu Asn Glu Asp Ser Leu Asn Gin Phe Leu Gin Gin 
30 35 40 

TGT TGG CAG ATT GAC GAG GGC GAG AAG GTG ACC AAC CTG GAG CCG TTG 3761 
Cys Trp Gin lie Asp Glu Gly Glu Lys Val Thr Asn Leu Glu Pro Leu 
45 50 55 

GGA TTC TTT ACA AAG GTG GTT TCG CGC GAC GAA GAG AAC CTC CGG CTC 3809 
Gly Phe Phe Thr Lys Val Val Ser Arg Asp Glu Glu Asn Leu Arg Leu 
60 65 70 

AAC GTA TAC TAT GCC AAG AGC CCA CTG GAT GCA CAG ACG CTG CAG TTT 3857 
Asn Val Tyr Tyr Ala Lys Ser Pro Leu Asp Ala Gin Thr Leu Gin Phe 
75 80 85 90 

CTG GGC GTG TTC CTG CGC CAA ATG GAA ACC TCA CAA ATA CGT TGG ATC 3 905 

Leu Gly Val Phe Leu Arg Gin Met Glu Thr Ser Gin lie Arg Trp lie 
95 100 105 



•+* W / 
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TTC CTA CTG GAC TGG CTG CTA GAC GAT AAA CGA TTA TGG CTA CGT CAA 3953 

Phe Leu Leu Asp Trp Leu Leu Asp Asp Lys Arg Leu Trp Leu Arg Gin 
110 115 120 

CTG CGG AAC TCG TGG GCC GCC TTG GAG GAA GCG CAG GTG GCA CCC TTT 4001 

Leu Arg Asn Ser Trp Ala Ala Leu Glu Glu Ala Gin Val Ala Pro Phe 
125 130 135 

CCA GGT GGC GCT GTG GTG GTG GTC CTC AAC CCG AGT CAC GTG ACA CAA 4049 

Pro Gly Gly Ala Val Val Val Val Leu Asn Pro Ser His Val Thr Gin 

140 145 150 

CTG GAG CGA AAC ACG ATG GTT TGG AAC TCC CGC CGT CTG GAC CTG GTA 4097 

Leu Glu Arg Asn Thr Met Val Trp Asn Ser Arg Arg Leu Asp Leu Val 

155 160 165 170 

CAC CAG ACA CTG CGA GCT GCA TGC CTC AAC ACC GGC TCG GCG CTA GTT 4145 

His Gin Thr Leu Arg Ala Ala Cys Leu Asn Thr Gly Ser Ala Leu Val 
175 180 185 

b 

Q ACA CTT GAT CCT AAT ACT GCG CGC GAA GAC GTC ATG CAC ATA TGT GCG 4193 

Hi Thr Leu Asp Pro Asn Thr Ala Arg Glu Asp Val Met His lie Cys Ala 
fll 190 195 200 

P* 

|H CTG CTT GCG GGG CTG CCT ACA TCC CGT CCC GTC GCG ATG CTA AGC CTG 4241 

SJ Leu Leu Ala Gly Leu Pro Thr Ser Arg Pro Val Ala Met Leu Ser Leu 
a 205 210 215 

ftj CAA AGT CTA TTC ATC CCC CAC GGT GCA GAT TCC ATC GGC AAG ATC TGC 4289 

H 8 Gin Ser Leu Phe lie Pro His Gly Ala Asp Ser He Gly Lys He Cys 

Ul 220 225 230 

fU ACC ATC GCG CCC GAG TTC CCT GTT GCT ACG GTG TTC GAC AAC GAT TTT 4337 

Thr He Ala Pro Glu Phe Pro Val Ala Thr Val Phe Asp Asn Asp Phe 

235 240 245 250 

GTG AGC TCG ACA TTC GAG GCC GCA ATT GCT CCA GAA CTT ACT CCA GGA 4 385 

Val Ser Ser Thr Phe Glu Ala Ala He Ala Pro Glu Leu Thr Pro Gly 
255 260 265 

CCA CGT GTG CCA TCT GAC CAC CCA TGG CTA ACA GAG CCT ACC AAC CCC 4433 

Pro Arg Val Pro Ser Asp His Pro Trp Leu Thr Glu Pro Thr Asn Pro 
270 275 280 

CCT TCG GAG GCA ACC GCT TGG CAT TTC GAT CTC CAA GGT CGC CTC GCT 4481 

Pro Ser Glu Ala Thr Ala Trp His Phe Asp Leu Gin Gly Arg Leu Ala 
285 290 295 



31 

ACC CTA TAC CGG CAT CTT GGT GAC TCT AAC AAG GCC ATA TCT GTT ACT 4529 

Thr Leu Tyr Arg His Leu Gly Asp Ser Asn Lys Ala lie Ser Val Thr 
300 305 310 

CAG CAC CGC TTC CAC AAG CCC CGC TCG GAA GAT TAT GCA TAC GAA TTC 457 7 

Gin His Arg Phe His Lys Pro Arg Ser Glu Asp Tyr Ala Tyr Glu Phe 
315 320 325 330 

GAG CTG CCG TCT AAG CAC CCT ACA ATA CGT GAC CTC ATA CGC TCT GCC 4625 
Glu Leu Pro Ser Lys His Pro Thr lie Arg Asp Leu lie Arg Ser Ala 
335 340 345 

GCA GCC GAC TCA CCG AAC GAC GTC GCT GAC TCC ATC GAT GGG CTT ATG 4673 
Ala Ala Asp Ser Pro Asn Asp Val Ala Asp Ser lie Asp Gly Leu Met 
350 355 360 

GAT GGT ATC GTA CAA AGG AAT GTT CAT TGACGTCGAC ACAAAAATTT 4720 
Asp Gly He Val Gin Arg Asn Val His 

•Sf TGTTACTGTT CTCTCGAGAA CTATTCTCAT CCAGTACTGA CATATTAGAA GGCGAAGTGA 4780 

\j 

m ACTAGGATTT ATATAAAGTA GCCTTCAGGC AATTGCACAG GGTCTATTGA GTCGCTGCCG 4840 

y s 

TTCACGAGAG AGCCCAATAT ATCGAGGACT AATTGGTCAC TTTTGTTTTG CTATACTCAC 4900 

CCTGTATTTG CTAATCATTT ATCCGCTTTG TCCAAGTGGT TGCGAAGATA TCGAGCCAGA 4960 

ACATTAGAAT CTGGTTTGCC GC ATCC TAG A GCTGTCTCCA AGCCAGTTGA ACCGTTGCGG 5020 

GAGATTACCG CAGCCGGTTT GATCAGAGTA CTGGTGACTG CCAGCACCCA CGTTTGTGAC 5080 

TTATAAATAT ACGCCCTGTG GAGCCATAGC CATTGGCATA AAGAGAAGAG CACCCCGTGC 514 0 

CACGATGCAG ACACTTCCGG TGTACCCAGC GTCACAGACT GCGTCGCCTA CGAAGCGTGA 5200 

ACTTGCAGCG GCGCCCTCGG TGCCGCAGGA CGGCGCCCGG CTGCCTGCGC AGCTCACTTT 5260 

AGTGACGCCC CCAGAACCTG AT ATC C AGAA GAAGTCAGTG CGATCTCAGG TCGCGCGTTT 5320 

AAGCATCTCG GAGACAGATG TAGTGAAGAG TGATATCGTG GCTAAGCTT 5369 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 475 Amino acids 

(B) TYPE: Amino acid 
(D) TOPOLOGY: linear 



m 
TU 
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(ii) MOLECULE TYPE: Protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



Met Asp Arg Gly Cys Lys Gly lie Ser Tyr Val Leu Ser Ala Met Val 
15 10 15 

Phe His lie He Pro He Thr Phe Glu He Ser Met Val Cys Gly He 
20 25 30 

Leu Thr Tyr Gin Phe Gly Ala Ser Phe Ala Ala lie Thr Phe Ser Thr 
35 40 45 

Met Leu Leu Tyr Ser He Phe Thr Phe Arg Thr Thr Ala Trp Arg Thr 
50 55 60 

Arg Phe Arg Arg Asp Ala Asn Lys Ala Asp Asn Lys Ala Ala Ser Val 
65 70 75 80 

Ala Leu Asp Ser Leu He Asn Phe Glu Ala Val Lys Tyr Phe Asn Asn 
85 90 95 

Glu Lys Tyr Leu Ala Asp Lys Tyr His Thr Ser Leu Met Lys Tyr Arg 
100 105 110 

Asp Ser Gin He Lys Val Ser Gin Ser Leu Ala Phe Leu Asn Thr Gly 
115 120 125 

Gin Asn Leu He Phe Thr Thr Ala Leu Thr Ala Met Met Tyr Met Ala 
130 135 140 

Cys Asn Gly Val Met Gin Gly Ser Leu Thr Val Gly Asp Leu Val Leu 
145 150 155 160 

He Asn Gin Leu Val Phe Gin Leu Ser Val Pro Leu Asn Phe Leu Gly 
165 170 175 

Ser Val Tyr Arg Asp Leu Lys Gin Ser Leu He Asp Met Glu Ser Leu 
180 185 190 

Phe Lys Leu Gin Lys Asn Gin Val Thr He Lys Asn Ser Pro Asn Ala 
195 200 205 

Gin Asn Leu Pro He His Lys Pro Leu Asp He Arg Phe Glu Asn Val 
210 215 220 



Thr Phe Gly Tyr Asp Pro Glu Arg Arg He Leu Asn Asn Val Ser Phe 
225 230 235 240 
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Thr lie Pro Ala Gly Met Lys Thr Ala lie Val Gly Pro Ser Gly Ser 
245 250 255 

Gly Lys Ser Thr lie Leu Lys Leu Val Phe Arg Phe Tyr Glu Pro Glu 
260 265 270 

Gin Gly Arg lie Leu Val Gly Gly Thr Asp lie Arg Asp Leu Asp Leu 
275 280 285 

Leu Ser Leu Arg Lys Ala lie Gly Val Val Pro Gin Asp Thr Pro Leu 
290 295 300 

Phe Asn Asp Thr lie Trp Glu Asn Val Lys Phe Gly Asn lie Ser Ser 
305 310 315 320 

Ser Asp Asp Glu lie Leu Arg Ala lie Glu Lys Ala Gin Leu Thr Lys 
325 330 335 

Leu Leu Gin Asn Leu Pro Lys Gly Ala Ser Thr Val Val Gly Glu Arg 
340 345 350 

Gly Leu Met lie Ser Gly Gly Glu Lys Gin Arg Leu Ala lie Ala Arg 
355 360 365 

Val Leu Leu Lys Asp Ala Pro Leu Met Phe Phe Asp Glu Ala Thr Ser 
370 375 380 

Ala Leu Asp Thr His Thr Glu Gin Ala Leu Leu His Thr lie Gin Gin 
385 390 395 400 

Asn Phe Ser Ser Asn Ser Lys Thr Ser Val Tyr Val Ala His Arg Leu 
405 410 415 

Arg Thr lie Ala Asp Ala Asp Lys lie lie Val Leu Glu Gin Gly Ser 
420 425 430 

Val Arg Glu Glu Gly Thr His Ser Ser Leu Leu Ala Ser Gin Gly Ser 
435 440 445 

Leu Tyr Arg Gly Leu Trp Asp lie Gin Glu Asn Leu Thr Leu Pro Glu 
450 455 460 

Arg Pro Glu Gin Ser Thr Gly Ser Gin His Ala 
465 470 475 

(2) INFORMATION FOR SEQ ID NO: 5: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 510 Amino acids 
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(B) TYPE: Amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Met Cys Gly lie Leu Gly Val Val Leu Ala Asp Gin Ser Lys Val Val 
15 10 15 

Ala Pro Glu Leu Phe Asp Gly Ser Leu Phe Leu Gin His Arg Gly Gin 
20 25 30 

Asp Ala Ala Gly He Ala Thr Cys Gly Pro Gly Gly Arg Leu Tyr Gin 
35 40 45 

Cys Lys Gly Asn Gly Met Ala Arg Asp Val Phe Thr Gin Ala Arg Met 
50 55 60 

Ser Gly Leu Val Gly Ser Met Gly He Ala His Leu Arg Tyr Pro Thr 
65 70 75 80 

Ala Gly Ser Ser Ala Asn Ser Glu Ala Gin Pro Phe Tyr Val Asn Ser 
85 90 95 

Pro Tyr Gly He Cys Met Ser His Asn Gly Asn Leu Val Asn Thr Met 
100 105 110 

Ser Leu Arg Arg Tyr Leu Asp Glu Asp Val His Arg His He Asn Thr 
115 120 125 

Asp Ser Asp Ser Glu Leu Leu Leu Asn lie Phe Ala Ala Glu Leu Glu 
130 135 140 

Lys Tyr Asn Lys Tyr Arg Val Asn Asn Asp Asp He Phe Cys Ala Leu 
145 150 155 160 

Glu Gly Val Tyr Lys Arg Cys Arg Gly Gly Tyr Ala Cys Val Gly Met 
165 170 175 

Leu Ala Gly Tyr Gly Leu Phe Gly Phe Arg Asp Pro Asn Gly He Arg 
180 185 190 

Pro Leu Leu Phe Gly Glu Arg Val Asn Asp Asp Gly Thr Met Asp Tyr 
195 200 205 



Met Leu Ala Ser Glu Ser Val Val Leu Lys Ala His Arg Phe Gin Asn 
210 215 220 
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lie Arg Asp lie Leu Pro Gly Gin Ala Val lie lie Pro Lys Thr Cys 
225 230 235 240 

Gly Ser Ser Pro Pro Glu Phe Arg Gin Val Val Pro He Glu Ala Tyr 
245 250 255 

Lys Pro Asp Leu Phe Glu Tyr Val Tyr Phe Ala Arg Ala Asp Ser Val 
260 265 270 

Leu Asp Gly He Ser Val Tyr His Thr Arg Leu Leu Met Gly He Lys 
275 280 285 

Leu Ala Glu Asn He Lys Lys Gin lie Asp Leu Asp Glu He Asp Val 
290 295 300 

Val Val Ser Val Pro Asp Thr Ala Arg Thr Cys Ala Leu Glu Cys Ala 
305 310 315 320 



Q 

tew 

si 



Asn His Leu Asn Lys Pro Tyr Arg Glu Gly Phe Val Lys Asn Arg Tyr 
325 330 335 

Val Gly Arg Thr Phe He Met Pro Asn Gin Lys Glu Arg Val Ser Ser 
340 345 350 



m 

HI 



Val Arg Arg Lys Leu Asn Pro Met Asn Ser Glu Phe Lys Asp Lys Arg 
355 360 365 



Val Leu He Val Asp Asp Ser He Val Arg Gly Thr Thr Ser Lys Glu 
370 375 380 



He Val Asn Met Ala Lys Glu Ser Gly Ala Ala Lys Val Tyr Phe Ala 
385 390 395 400 



Ser Ala Ala Pro Ala He Arg Phe Asn His He Tyr Gly He Asp Leu 
405 410 415 

Ala Asp Thr Lys Gin Leu Val Ala Tyr Asn Arg Thr Val Glu Glu He 
420 425 430 

Thr Ala Glu Leu Gly Cys Asp Arg Val He Tyr Gin Ser Leu Asp Asp 
435 440 445 

Leu He Asp Cys Cys Lys Thr Asp He He Ser Glu Phe Glu Val Gly 
450 455 460 



Val Phe Thr Gly Asn Tyr Val Thr Gly Val Glu Asp Val Tyr Leu Gin 
465 470 475 480 



Glu Leu Glu Arg Cys Arg Ala Leu Asn Asn Ser Asn Lys Gly Glu Ala 
485 490 495 
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III 



in 

u 

ft! 



Lys Ala Glu Val Asp lie Gly Leu Tyr Asn Ser Ala Asp Tyr 
500 505 510 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 371 Amino acids 

(B) TYPE: Amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Ser Ser Gly Asn lie Trp Lys Gin Leu Leu Glu Glu Asn Ser Glu 
15 10 15 

Gin Leu Asp Gin Ser Thr Thr Glu Thr Tyr Val Val Cys Cys Glu Asn 
20 25 30 

Glu Asp Ser Leu Asn Gin Phe Leu Gin Gin Cys Trp Gin lie Asp Glu 
35 40 45 

Gly Glu Lys Val Thr Asn Leu Glu Pro Leu Gly Phe Phe Thr Lys Val 
50 55 60 

Val Ser Arg Asp Glu Glu Asn Leu Arg Leu Asn Val Tyr Tyr Ala Lys 
65 70 75 80 

Ser Pro Leu Asp Ala Gin Thr Leu Gin Phe Leu Gly Val Phe Leu Arg 
85 90 95 

Gin Met Glu Thr Ser Gin lie Arg Trp lie Phe Leu Leu Asp Trp Leu 
100 105 110 

Leu Asp Asp Lys Arg Leu Trp Leu Arg Gin Leu Arg Asn Ser Trp Ala 
115 120 125 

Ala Leu Glu Glu Ala Gin Val Ala Pro Phe Pro Gly Gly Ala Val Val 
130 135 140 

Val Val Leu Asn Pro Ser His Val Thr Gin Leu Glu Arg Asn Thr Met 
145 150 155 160 

Val Trp Asn Ser Arg Arg Leu Asp Leu Val His Gin Thr Leu Arg Ala 
165 170 175 



i 
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Ala Cys Leu Asn Thr Gly Ser Ala Leu Val Thr Leu Asp Pro Asn Thr 
180 185 190 

Ala Arg Glu Asp Val Met His lie Cys Ala Leu Leu Ala Gly Leu Pro 
195 200 205 

Thr Ser Arg Pro Val Ala Met Leu Ser Leu Gin Ser Leu Phe lie Pro 
210 215 220 

His Gly Ala Asp Ser lie Gly Lys lie Cys Thr lie Ala Pro Glu Phe 
225 230 235 240 

Pro Val Ala Thr Val Phe Asp Asn Asp Phe Val Ser Ser Thr Phe Glu 
245 250 255 

Ala Ala lie Ala Pro Glu Leu Thr Pro Gly Pro Arg Val Pro Ser Asp 
260 265 270 

H 5 His Pro Trp Leu Thr Glu Pro Thr Asn Pro Pro Ser Glu Ala Thr Ala 

0 275 280 285 

SI Trp His Phe Asp Leu Gin Gly Arg Leu Ala Thr Leu Tyr Arg His Leu 

01 290 295 300 

Ml Gly Asp Ser Asn Lys Ala lie Ser Val Thr Gin His Arg Phe His Lys 

N 305 310 315 320 

■W Pro Arg Ser Glu Asp Tyr Ala Tyr Glu Phe Glu Leu Pro Ser Lys His 

P 325 330 335 



f^ 5 

o 
ft! 



Pro Thr lie Arg Asp Leu lie Arg Ser Ala Ala Ala Asp Ser Pro Asn 
340 345 350 

Asp Val Ala Asp Ser lie Asp Gly Leu Met Asp Gly lie Val Gin Arg 
355 360 365 

Asn Val His 
370 

(2) INFORMATION FOR SEQ ID NO: 7; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3616 base pairs 

(B) TYPE: Nucleic acid 
<C) STRANDEDNESS: Single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 



(ix) FEATURES: 

(A) NAME /KEY : 5'UTR 

(B) LOCATION: 1..863 

(ix) FEATURES: 

(A) NAME /KEY: CDS 

(B) LOCATION: 864.. 1316 

(ix) FEATURES: 

(A) NAME /KEY: intron 

(B) LOCATION: 1317.. 1477 

f* (ix) FEATURES: 

0 (A) NAME /KEY: CDS 

Q (B) LOCATION 1478.. 2592 



111 
M 



T4 I 



(ix) FEATURES: 

(A) NAME /KEY : 3'UTR 

(B) LOCATION: 2593.. 3616 



Q (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GGGCCCGGTG CCAGCTCGCC AGGTGCGGAC TCGCGCTCGG GCTGTGGGCG CTCTACCTGC 60 

TGCTGCTCGG CAGCTGCCTG ACGCGCGCGT ACGAGCTGTC GGATCTCGAA AACCTGGAAT 120 

C C GAT T ACTA CAGCTACGTG CTGGATGTGA ACTTCGCGCT GCTGAGCGCC ATGAGCGCGA 180 

CCGGCCTCGC GATGGGCGCC GTGAGCGGCT CCCTCGGGAG CGCGCCGGTG CTCGCGCAGT 240 

GGCCGGCAGC GATCTGGGCC GTGCGCTTCC TGCGCGCCGC GGGCTATGTC GCGATAGTCC 300 

TAATCCTGCC GTTCCTGTCC GTCGTCGCAT TCCTGCAGCC GCTCTGCGAG CGCGCGCTGG 360 

CGCTGTTCCC GTTTGTGCGC GCGTGGGGCA TGGACGGCGT GTTCAACTTC CTGCTGCTCT 420 

CCGCCGTGCT CTGGACTGTA TTCCTGGCCG TTCGCCTGCT CCGCGCCGTC TACAGACTGC 480 

TGCGCTGGCT GGTCGGTCTT TTGGTCCGCC TGGCACGCCT GCTGCTGCGA GGCGCCCGTC 540 

GGACGCCTGC GGCGGCCCCC GAGGAGCCCG TCTAGCGTGC GCGCGTTCTA GGCCCCTGAC 600 



AGCTCCTACC TGGTGCTGGC CGCCGGTAGG GCTCGCATCG TGCGGCGCAG GCCCATTGCT 



660 
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TTTTGGCCCC CGCTGGATCA TCGTTTCTTT TACGTGAAAA GTTTGCAGCG ATGAGCTGCA 720 

GTATAAATAG GTTTTCTAGA TGCGCCAAAT CCCAGCTGGG TTTACCGGCG TCTGTTCGGG 780 

ATAGTTACTT GATGGATGGG TCAACTTGAG AGCTTGGGTT TAGTGTTGAC TCCTTCTCTT 840 

CATAGCACGC CGAACAAAGC GCA ATG ACT TAC AGA GAC GCA GCC ACG GCA 890 

Met Thr Tyr Arg Asp Ala Ala Thr Ala 
1 5 

CTG GAG CAC CTG GCG ACG TAC GCC GAG AAG GAC GGG CTG TCC GTG GAG 938 
Leu Glu His Leu Ala Thr Tyr Ala Glu Lys Asp Gly Leu Ser Val Glu 
10 15 20 25 

CAG TTG ATG GAC TCC AAG ACG CGG GGC GGG TTG ACG TAC AAC GAC TTC 986 
Gin Leu Met Asp Ser Lys Thr Arg Gly Gly Leu Thr Tyr Asn Asp Phe 
30 35 40 

. CTG GTC TTG CCG GGC AAG ATC GAC TTC CCA TCG TCG GAG GTG GTG CTG 1034 

■jwt Leu Val Leu Pro Gly Lys lie Asp Phe Pro Ser Ser Glu Val Val Leu 
m 45 50 55 

m TCG TCG CGC CTG ACC AAG AAG ATC ACC TTG AAC GCG CCG TTT GTG TCG 1082 

fa Ser Ser Arg Leu Thr Lys Lys lie Thr Leu Asn Ala Pro Phe Val Ser 
in 60 65 70 



TCG CCG ATG GAC ACG GTG ACG GAG GCC GAC ATG GCG ATC CAC ATG GCG 1130 
Q Ser Pro Met Asp Thr Val Thr Glu Ala Asp Met Ala lie His Met Ala 

fU 75 80 85 

fjl CTC CTG GGC GGC ATC GGG ATC ATC CAC CAC AAC TGC ACT GCG GAG GAG 1178 

Q Leu L eu Gly Gly He Gly He He His His Asn Cys Thr Ala Glu Glu 

RJ 90 9 5 100 105 

CAG GCG GAG ATG GTG CGC CGG GTC AAG AAG TAC GAA AAC GGG TTC ATC 1226 
Gin Ala Glu Met Val Arg Arg Val Lys Lys Tyr Glu Asn Gly Phe He 
110 115 120 

AAC GCC CCC GTG GTC GTG GGG CCG GAC GCG ACG GTG GCG GAC GTG CGC 1274 
Asn Ala Pro Val Val Val Gly Pro Asp Ala Thr Val Ala Asp Val Arg 
125 130 135 

CGG ATG AAG AAC GAG TTT GGG TTT GCA GGA TTT CCT GTG ACA 1316 
Arg Met Lys Asn Glu Phe Gly Phe Ala Gly Phe Pro Val Thr 
140 145 150 

GGTATGTTAG AGTGGCACGC GGGGCTGCAC GCTGGGATGA TGATCATAAA TCAATAACTT 1376 

TCGTTCTACT GACTGCGATC AAACGATCGT GTAGACACCT TTTACTCTGA CCGCAGACGT 1436 



40 

GCAGCGCCTT TTTGGCAGGA ACATGTACTA ACACATCAGC A GAT GAT GGC AAG 14 8 9 

Asp Asp Gly Lys 



CCG ACC GGG AAG CTG CAG GGG ATC ATC ACG TCC CGT GAC ATC CAG TTT 1537 
Pro Thr Gly Lys Leu Gin Gly He He Thr Ser Arg Asp He Gin Phe 
5 10 15 20 

GTC GAG GAC GAG ACC CTG CTT GTG TCT GAG ATC ATG ACC AAG GAC GTC 1585 
Val Glu Asp Glu Thr Leu Leu Val Ser Glu He Met Thr Lys Asp Val 
25 30 " 35 

ATC ACT GGG AAG CAG GGC ATC AAC CTC GAG GAG GCG AAC CAG ATC CTG 1633 
He Thr Gly Lys Gin Gly He Asn Leu Glu Glu Ala Asn Gin He Leu 
40 45 50 

AAG AAC ACC AAG AAG GGC AAG CTG CCA ATT GTG GAC GAG GCG GGC TGC 1681 
Lys Asn Thr Lys Lys Gly Lys Leu Pro lie Val Asp Glu Ala Gly Cys 
55 60 65 



CTG GTG TCC ATG CTT TCG AGA ACT GAC TTG ATG AAG AAC CAG TCC TAC 1729 
Leu Val Ser Met Leu Ser Arg Thr Asp Leu Met Lys Asn Gin Ser Tyr 

J 70 75 80 



CCA TTG GCC TCC AAG TCT GCC GAC ACC AAG CAG CTG CTC TGT GGT GCT 1777 
Pro Leu Ala Ser Lys Ser Ala Asp Thr Lys Gin Leu Leu Cys Gly Ala 
85 90 95 100 

GCG ATC GGC ACC ATC GAC GCG GAC AGG CAG AGA CTG GCG ATG CTG GTC 1825 
Ala lie Gly Thr He Asp Ala Asp Arg Gin Arg Leu Ala Met Leu Val 
105 HO us 

GAG GCC GGT CTG GAC GTT GTT GTG CTA GAC TCC TCG CAG GGT AAC TCG 1873 
Glu Ala Gly Leu Asp Val Val Val Leu Asp Ser Ser Gin Gly Asn Ser 
120 125 130 

GTC TTC CAG ATC AAC ATG ATC AAG TGG ATC AAG GAG ACC TTC CCA GAC 1921 
Val Phe Gin He Asn Met He Lys Trp He Lys Glu Thr Phe Pro Asp 
135 140 145 

CTG CAG GTC ATT GCT GGC AAC GTG GTC ACC AGA GAG CAG GCT GCC AGC 1969 
Leu Gin Val He Ala Gly Asn Val Val Thr Arg Glu Gin Ala Ala Ser 
150 155 160 

TTG ATC CAC GCC GGC GCA GAC GGG TTG CGT ATC GGT ATG GGC TCT GGC 2017 
Leu He His Ala Gly Ala Asp Gly Leu Arg He Gly Met Gly Ser Gly 
i65 170 175 180 



Li, 
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TCC ATC TGT ATC ACT CAG GAG GTG ATG GCC TGT GGT AGA CCA CAG GGT 2065 
Ser He Cys He Thr Gin Glu Val Met Ala Cys Gly Arg Pro Gin Gly 
185 190 195 

ACC GCT GTC TAC AAC GTC ACG CAG TTC GCC AAC CAG TTT GGT GTG CCA 2113 
Thr Ala Val Tyr Asn Val Thr Gin Phe Ala Asn Gin Phe Gly Val Pro 
200 205 210 

TGT ATT GCT GAC GGT GGT GTC CAG AAC ATC GGG CAC ATT ACC AAA GCT 2161 
Cys He Ala Asp Gly Gly Val Gin Asn He Gly His He Thr Lys Ala 
215 220 225 

ATC GCT CTT GGC GCG TCC ACC GTC ATG ATG GGC GGT ATG CTG GCA GGC 2209 
lie Ala Leu Gly Ala Ser Thr Val Met Met Gly Gly Met Leu Ala Gly 
230 235 240 

ACT ACA GAG TCT CCA GGC GAG TAC TTC TTC AGG GAC GGG AAG AGA CTG 2257 
Thr Thr Glu Ser Pro Gly Glu Tyr Phe Phe Arg Asp Gly Lys Arg Leu 
245 250 255 260 

AAG ACC TAC AGA GGT ATG GGC TCC ATC GAC GCC ATG CAA AAG ACT GAT 2305 
Lys Thr Tyr Arg Gly Met Gly Ser He Asp Ala Met Gin Lys Thr Asp 
265 270 275 

GTC AAG GGT AAC GCC GCT ACC TCC CGT TAC TTC TCT GAG TCT GAC AAG 2353 
Val Lys Gly Asn Ala Ala Thr Ser Arg Tyr Phe Ser Glu Ser Asp Lys 
280 285 290 



fil GTT CTG GTC GCT CAG GGT GTT ACT GGT TCT GTG ATC GAC AAG GGC TCC 2401 

12. Val L eu Val Ala Gin Gly Val Thr Gly Ser Val lie Asp Lys Gly Ser 

jff 295 300 305 

ffi ATC AAG AAG TAC ATT CCA TAT CTG TAC AAT GGT CTA CAG CAC TCG TGC 2449 

He Lys Lys Tyr lie Pro Tyr Leu Tyr Asn Gly Leu Gin His Ser Cys 
310 315 320 

CAG GAT ATC GGT GTG CGC TCT CTA GTG GAG TTC AGA GAG AAG GTG GAC 2497 
Gin Asp lie Gly Val Arg Ser Leu Val Glu Phe Arg Glu Lys Val Asp 
325 330 335 340 

TCT GGC TCG GTC AGA TTT GAG TTC AGA ACT CCA TCT GCC CAG TTG GAG 2545 
Ser Gly Ser Val Arg Phe Glu Phe Arg Thr Pro Ser Ala Gin Leu Glu 
345 350 355 

GGT GGT GTG CAC AAC TTG CAC TCC TAC GAG AAG CGC CTA TTT GACTGAGTGC 2597 
Gly Gly Val His Asn Leu His Ser Tyr Glu Lys Arg Leu Phe Asp 
360 365 370 

CACTAGGCCC ACACTATAGA AGTGGATCCG GGCGCGATGG CACCCATACT TTTATATTAT 2657 
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GTTGATTGAT GTACGTAAAC GATAGATATA ATAACAGACG CGGCATCTCA TTTGTATGCA 2717 

ATATATCTGG AACATGGTTA TGCGTACTCA ACTGTATGTA CTACTTTATA TACACAGCTC 27 77 

TGGGACACTT GGTGAGATAT ATGTTTCATT ATGTATGCCT CGCTATCGAA AGGTCTGGCA 2837 

TTATGGGCTA CTGGGTCTAA GAGTCATGGC TTATGAGTAT TTATTTATTT ATTTCTCTTC 2897 

CTTTTCATTA AACTCCTCGA GCTTCTTTCT GTAATACTGC TCTCTAGACT TCTCCACATC 2957 

TGCTAATGAT GGTGGAAGTC GTTCGTTTTC CAAATCCGCT CTACGAGCGC GCTCGAAGTT 3017 

AGACAGCGCC TCGTTCAGAC CTTCAGACCC GCGTGACAGC GCTCCACGAG GCAGCACGCC 3077 

AGAATTCATT GTTTTTAGGT ACTGCACCTT ATCGCTCTCT TCTCTCAACA CGCTATACAT 3137 

TCGGGAAACC TTGGCAATCG CCAATATTTT ACTGCGTAGT GCACGCCGTT TTGCATCATC 3197 

GTCCAGAATA GACCGTTTTT TCTTCGATTT CTTGGAGCCA GGTATAACAG TTACAACCTG 3257 

p CTCAGTGTTT TTGGACTTCA ATGTAGCACC TAAGTCCTCC CTTATAACAA AAGTCTCTTC 3317 

SI CTCCAATTCT TCTTCAGTAC AAATGTTTAA TATCGAAACC AACATTTCAG TCACTTTCTC 3377 

GCCAACAAAT GGCAAAGACC AGGTGAATAC GTCCATGAAA TTCGGTAACC AATACGGATG 3437 

H CTGTGACATG TTAAATTGTC TAATGTTCAT AACGTTATCC GAGTATTTTA GGACCGCGGC 3497 

0 CTTGTTCTTG TAAGTGTCCA AGTAGTTGGG TGCGCTGAAC AACGTAAGTA AACTAGGAAA 3557 

N s GCCCAGATTC TTGGTATTCT TGTACATTCT GTAGCCCTGA TCTTGGGCTT CGTGGGCCC 3616 

y s 

hi - (2) INFORMATION FOR SEQ ID NO: 8: 

w 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 151 Amino acids 

(B) TYPE: Amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Thr Tyr Arg Asp Ala Ala Thr Ala Leu Glu His Leu Ala Thr Tyr 
15 10 15 

Ala Glu Lys Asp Gly Leu Ser Val Glu Gin Leu Met Asp Ser Lys Thr 
20 25 30 
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Arg Gly Gly Leu Thr Tyr Asn Asp Phe Leu Val Leu Pro Gly Lys lie 
35 40 * 45 

Asp Phe Pro Ser Ser Glu Val Val Leu Ser Ser Arg Leu Thr Lys Lys 
50 55 60 

lie Thr Leu Asn Ala Pro Phe Val Ser Ser Pro Met Asp Thr Val Thr 
65 70 75 80 

Glu Ala Asp Met Ala He His Met Ala Leu Leu Gly Gly He Gly He 
85 90 95 

He His His Asn Cys Thr Ala Glu Glu Gin Ala Glu Met Val Arg Arg 
100 105 110 

Val Lys Lys Tyr Glu Asn Gly Phe lie Asn Ala Pro Val Val Val Gly 
115 120 125 

Pro Asp Ala Thr Val Ala Asp Val Arg Arg Met Lys Asn Glu Phe Gly 
130 135 140 

Phe Ala Gly Phe Pro Val Thr 
145 150 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 371 Amino acids 

(B) TYPE: Amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Asp Asp Gly Lys Pro Thr Gly Lys Leu Gin Gly lie lie Thr Ser Arg 
15 10 15 

Asp He Gin Phe Val Glu Asp Glu Thr Leu Leu Val Ser Glu He Met 
20 25 30 

Thr Lys Asp Val lie Thr Gly Lys Gin Gly He Asn Leu Glu Glu Ala 
35 40 45 



Asn Gin He Leu Lys Asn Thr Lys Lys Gly Lys Leu Pro lie Val Asp 
50 55 60 
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Glu Ala Gly Cys Leu Val Ser Met Leu Ser Arg Thr Asp Leu Met Lys 
65 70 75 80 

Asn Gin Ser Tyr Pro Leu Ala Ser Lys Ser Ala Asp Thr Lys Gin Leu 
85 90 95 

Leu Cys Gly Ala Ala lie Gly Thr lie Asp Ala Asp Arg Gin Arg Leu 
100 105 110 

Ala Met Leu Val Glu Ala Gly Leu Asp Val Val Val Leu Asp Ser Ser 
115 120 125 

Gin Gly Asn Ser Val Phe Gin lie Asn Met lie Lys Trp lie Lys Glu 
130 135 140 

Thr Phe Pro Asp Leu Gin Val He Ala Gly Asn Val Val Thr Arg Glu 
145 150 155 160 

Gin Ala Ala Ser Leu He His Ala Gly Ala Asp Gly Leu Arg He Gly 
165 170 175 

Met Gly Ser Gly Ser He Cys He Thr Gin Glu Val Met Ala Cys Gly 
180 185 190 

Arg Pro Gin Gly Thr Ala Val Tyr Asn Val Thr Gin Phe Ala Asn Gin 
195 200 205 

Phe Gly Val Pro Cys He Ala Asp Gly Gly Val Gin Asn He Gly His 
210 215 220 

He Thr Lys Ala He Ala Leu Gly Ala Ser Thr Val Met Met Gly Gly 
225 230 235 240 

Met Leu Ala Gly Thr Thr Glu Ser Pro Gly Glu Tyr Phe Phe Arg Asp 
245 250 255 

Gly Lys Arg Leu Lys Thr Tyr Arg Gly Met Gly Ser He Asp Ala Met 
260 265 270 

Gin Lys Thr Asp Val Lys Gly Asn Ala Ala Thr Ser Arg Tyr Phe Ser 
275 280 285 

Glu Ser Asp Lys Val Leu Val Ala Gin Gly Val Thr Gly Ser Val He 
290 295 300 



Asp Lys Gly Ser He Lys Lys Tyr lie Pro Tyr Leu Tyr Asn Gly Leu 
305 310 315 320 



Gin His Ser Cys Gin Asp He Gly Val Arg Ser Leu Val Glu Phe Arg 
325 330 335 
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Glu Lys Val Asp Ser Gly Ser Val Arg Phe Glu Phe Arg Thr Pro Ser 
340 345 350 

Ala Gin Leu Glu Gly Gly Val His Asn Leu His Ser Tyr Glu Lys Arg 
355 360 365 

Leu Phe Asp 
370 

(2) INFORMATION FOR SEQ ID NO: 10: 



,55*2. 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2697 base pairs 

(B) TYPE: Nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 



H| (ix) FEATURES: 

SJ (A) NAME /KEY: 5'UTR 

n (B) LOCATION: 1..455 



(ix) FEATURES: 

(A) NAME /KEY: CDS 

(B) LOCATION: 456.. 2033 

(ix) FEATURES: 

(A) NAME /KEY : 3'UTR 

(B) LOCATION: 2034.. 2697 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

ATCGATTTCA GGAGATTTTT GGTAGCATTA TTGAGGTCAT TAGAGGCGTT CTGTGACTTT 60 

CGACGATTTG CACGCGCAGA AGAGGGCGTT CAACCAGCCT TTCGGATATT CCGGTTCGAG 120 

TTATACCAGC AGGGATCAGC GCAGGCACTA GAGTGGCGGG TGCTAATAAG AGGAGCAGGT 180 

CCTGGAACTG AAGTTGCAAG AGATAAGCAT TGCGCGGAGA AGGAGGCGGT TAGAGGGTGC 240 

AAGCGAGCAG GATGGGGTCT TCGATGAACT TCCCGTCTGG GTATGTGAAC AAGCACACGC 300 



m 
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TGCAGGCACA CCGGTAGGGC GAGTGCAGGG TGAAAAATAT ATATGCGCTC GAGAAGCGCT 360 



GGGGATGAGT TCGTCTGCAA CGGCAGGCGG ATCTTCATCT GACAAAACCA GCTGCCTACA 420 

TCAGTGCGAA GCTGTTCAGT GATAGAATAG GAGTA ATG GCT GCT GTT GAA CAA 473 

Met Ala Ala Val Glu Gin 
1 5 

GTT TCT AGC GTG TTT GAG ACC ATT TTG GTG CTG GAC TTC GGG TCC CAG 521 
Val Ser Ser Val Phe Asp Thr lie Leu Val Leu Asp Phe Gly Ser Gin 
10 15 20 

TAC TCG CAT CTG ATC ACG CGG CGG CTG CGT GAG TTT AAT GTG TAC GCG 569 
Tyr Ser His Leu lie Thr Arg Arg Leu Arg Glu Phe Asn Val Tyr Ala 
25 30 35 

GAG ATG CTT CCG TGT ACG CAG AAG ATC AGC GAG CTG GGC TGG AAG CCA 617 
Glu Met Leu Pro Cys Thr Gin Lys lie Ser Glu Leu Gly Trp Lys Pro 
40 45 50 

AAG GGT GTG ATT TTG TCA GGC GGG CCG TAC TCC GTG TAC GCG GCA GAT 665 
Lys Gly Val lie Leu Ser Gly Gly Pro Tyr Ser Val Tyr Ala Ala Asp 
55 60 65 70 

GCT CCG CAC GTG GAC CGG GCG GTG TTC GAG TTG GGC GTT CCA ATT CTG 713 

Ala Pro His Val Asp Arg Ala Val Phe Glu Leu Gly Val Pro lie Leu 
75 80 85 

GGC ATC TGC TAC GGG CTA CAG GAG CTT GCG TGG ATA GCC GGC GCA GAG 761 

Gly lie Cys Tyr Gly Leu Gin Glu Leu Ala Trp lie Ala Gly Ala Glu 

in 90 95 100 

ffj GTG GGG CGC GGC GAG AAG CGC GAG TAC GGG CGC GCG ACG CTG CAC GTG 809 

Val Gly Arg Gly Glu Lys Arg Glu Tyr Gly Arg Ala Thr Leu His Val 
*4 105 110 115 

GAG GAC AGC GCG TGC CCG CTG TTC AAC AAC GTG GAC AGC AGC ACG GTG 857 
Glu Asp Ser Ala Cys Pro Leu Phe Asn Asn Val Asp Ser Ser Thr Val 
120 125 130 

TGG ATG TCG CAC GGT GAC AAG CTG CAC GCA CTA CCT GCG GAT TTC CAC 905 
Trp Met Ser His Gly Asp Lys Leu His Ala Leu Pro Ala Asp Phe His 
135 140 145 150 

GTC ACT GCG ACG ACG GAG AAC TCT CCT TTC TGC GGG ATT GCA CAC GAC 953 
Val Thr Ala Thr Thr Glu Asn Ser Pro Phe Cys Gly lie Ala His Asp 
155 160 165 
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TCG AAG CCA ATC TTC GGG ATC CAG TTC CAC CCT GAG GTG ACG CAC TCC 1001 

Ser Lys Pro lie Phe Gly lie Gin Phe His Pro Glu Val Thr His Ser 

170 175 180 

TCG CAG GGG AAG ACG TTG CTG AAG AAC TTT GCG GTG GAG ATC TGC CAG 1049 

Ser Gin Gly Lys Thr Leu Leu Lys Asn Phe Ala Val Glu lie Cys Gin 

185 190 195 

GCC GCG CAG ACC TGG ACG ATG GAA AAC TTC ATT GAC ACC GAG ATC CAG 1097 

Ala Ala Gin Thr Trp Thr Met Glu Asn Phe lie Asp Thr Glu lie Gin 

200 205 210 

CGG ATC CGG ACC CTT GTG GGC CCC ACC GCG GAA GTC ATC GGT GCT GTG 1145 

Arg He Arg Thr Leu Val Gly Pro Thr Ala Glu Val He Gly Ala Val 

215 220 225 230 

TCC GGC GGT GTC GAC TCG ACC GTC GCT GCG AAG CTG ATG ACC GAG GCC 1193 

Ser Gly Gly Val Asp Ser Thr Val Ala Ala Lys Leu Met Thr Glu Ala 

\jk 235 240 245 

Q ATC GGC GAC CGG TTC CAC GCG ATC CTG GTC GAC AAC GGT GTT CTG CGC 1241 

SI He Gly Asp Arg Phe His Ala He Leu Val Asp Asn Gly Val Leu Arg 

Cfi 250 255 260 

Iff CTC AAC GAA GCG GCC AAT GTG AAG AAA ATC CTC GGC GAG GGC TTG GGC 1289 

\| Leu Asn Glu Ala Ala Asn Val Lys Lys He Leu Gly Glu Gly Leu Gly 

a 265 270 275 

ft! ATC AAC TTG ACT GTT GTT GAC GCC TCC GAA GAG TTC TTG ACG AAG CTC 1337 

N He Asn Leu Thr Val Val Asp Ala Ser Glu Glu Phe Leu Thr Lys Leu 

111 280 285 290 

fU AAG GGC GTC ACG GAC CCT GAG AAG AAG AGA AAG ATC ATC GGT AAC ACC 1385 

Lys Gly Val Thr Asp Pro Glu Lys Lys Arg Lys He He Gly Asn Thr 

295 300 305 310 

TTC ATT CAT GTT TTT GAG CGC GAG GCA GCC AGG ATC CAG CCT AAG AAC 1433 

Phe He His Val Phe Glu Arg Glu Ala Ala Arg He Gin Pro Lys Asn 

315 320 325 

GGC GAG GAG ATT GAG TTC CTG TTG CAG GGT ACC CTA TAC CCT GAC GTT 1481 

Gly Glu Glu He Glu Phe Leu Leu Gin Gly Thr Leu Tyr Pro Asp Val 

330 335 340 

ATC GAG TCC ATT TCC TTT AAG GGC CCA TCT CAG ACG ATC AAG ACC CAC 1529 

He Glu Ser He Ser Phe Lys Gly Pro Ser Gin Thr He Lys Thr His 

345 350 355 
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CAT AAC GTC GGT GGT CTT TTG GAC AAC ATG AAA CTG AAG CTC ATT GAG 1577 

His Asn Val Gly Gly Leu Leu Asp Asn Met Lys Leu Lys Leu lie Glu 

360 365 370 

CCT TTG CGC GAG CTT TTC AAG GAC GAG GTG AGA CAC CTG GGA GAA CTA 1625 

Pro Leu Arg Glu Leu Phe Lys Asp Glu Val Arg His Leu Gly Glu Leu 
375 380 385 390 

TTG GGG ATC TCC CAC GAG TTG GTC TGG AGA CAT CCG TTC CCA GGC CCA 1673 

Leu Gly lie Ser His Glu Leu Val Trp Arg His Pro Phe Pro Gly Pro 
395 400 405 

GGT ATC GCC ATC CGT GTG CTA GGC GAG GTC ACC AAG GAG CAG GTG GAG 1721 

Gly He Ala He Arg Val Leu Gly Glu Val Thr Lys Glu Gin Val Glu 
410 415 420 

ATT GCC AGA AAG GCA GAC CAC ATC TAC ATC GAG GAG ATC AGG AAA GCA 1769 

He Ala Arg Lys Ala Asp His He Tyr He Glu Glu He Arg Lys Ala 
425 430 435 

GGT CTA TAC AAC AAG ATT TCT CAA GCT TTT GCT TGC TTG CTG CCT GTT 1817 

^ Gly Leu Tyr Asn Lys He Ser Gin Ala Phe Ala Cys Leu Leu Pro Val 

HI 440 445 450 



.SPSS. 
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AAG TCT GTG GGT GTC ATG GGT GAC CAG AGA ACC TAC GAC CAG GTC ATT 1865 
Lys Ser Val Gly Val Met Gly Asp Gin Arg Thr Tyr Asp Gin Val He 
455 460 465 470 

GCT CTA AGA GCA ATT GAG ACC ACG GAC TTC ATG ACT GCC GAC TGG TAT 1913 
Ala Leu Arg Ala He Glu Thr Thr Asp Phe Met Thr Ala Asp Trp Tyr 
475 480 485 

CCA TTT GAG CAC GAA TTC TTG AAG CAT GTC GCA TCC CGT ATT GTT AAC 1961 
Pro Phe Glu His Glu Phe Leu Lys His Val Ala Ser Arg He Val Asn 
490 495 500 

GAG GTT GAA GGT GTT GCC AGA GTC ACC TAC GAC ATA ACT TCT AAG CCT 2009 
Glu Val Glu Gly Val Ala Arg Val Thr Tyr Asp He Thr Ser Lys Pro 
505 510 515 

CCA GCT ACC GTT GAA TGG GAA TAATCACCCT TGGGATCCGC TGACTGGCTA 2060 
Pro Ala Thr Val Glu Trp Glu 
520 525 

CTGTAATTCT ATGTAGTGGA TTAGTACGAT AAGTTACTTT TGTATGATAG ATGTAATCAC 2120 

ATC TGG C TAT TAAAATGACT CAGCCGAGGT AAATCTAACG TCCCTTCACA AGGGTGTTCC 2180 

TGTGTGGACT TCCGCCTGAA TTTTTATAGA TATATAGATA CTCTACTCAT GAACAACCTG 2240 
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CAACCGAATA AGCATTAGTG CCAGGAGAAG AGAACCGTGG AAATGGGGCA AGTAGAAAAA 2300 

ATCATATTCC TTAAGAATAA GACAGTACCA GAGGACCATT ACGAGACGAT TTTTGAATCG 2360 

AATGGCTTCC AGACTCACTT TGTACCCATA ATAACCCATG AACACCTGCC AGATGAGGTT 2420 

CGCGGTCGAC TATCCGACGC GAATTACATG AAAAGGTTGA ATTGTTTGGT GGTAACCTCT 248 0 

CAGAGGACTG TGGAGTGTCT CTATGAGGAC GTTCTGCCCT CTCTTCCAGC TGAAGCACGC 2540 

AAATCTCTTC TCAATACGCC AGTATTCGTG GTTGGGCGTG CCACTCAGGA ATTTATGGAG 2600 

AGATGCGGCT TTACGGACGT GAGAGGGGGA TCTGAGACTG GTAATGGCGT TTTGCTAGCG 2660 

GAGTTAATGT TAAATATGAT CCAGAAGGGC GATGGGG 2697 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 525 Amino acids 

(B) TYPE: Amino acid 
(D) TOPOLOGY: linear 



*fi (ii) MOLECULE TYPE: Protein 

Si 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Ala Ala Val Glu Gin Val Ser Ser Val Phe Asp Thr lie Leu Val 
1 5 10 15 

Leu Asp Phe Gly Ser Gin Tyr Ser His Leu lie Thr Arg Arg Leu Arg 
20 25 30 

Glu Phe Asn Val Tyr Ala Glu Met Leu Pro Cys Thr Gin Lys He Ser 
35 40 45 

Glu Leu Gly Trp Lys Pro Lys Gly Val He Leu Ser Gly Gly Pro Tyr 
50 55 60 

Ser Val Tyr Ala Ala Asp Ala Pro His Val Asp Arg Ala Val Phe Glu 
65 70 75 80 

Leu Gly Val Pro He Leu Gly He Cys Tyr Gly Leu Gin Glu Leu Ala 
85 90 95 

Trp He Ala Gly Ala Glu Val Gly Arg Gly Glu Lys Arg Glu Tyr Gly 
100 105 110 
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Arg Ala Thr Leu His Val Glu Asp Ser Ala Cys Pro Leu Phe Asn Asn 
115 120 125 

Val Asp Ser Ser Thr Val Trp Met Ser His Gly Asp Lys Leu His Ala 
130 135 140 

Leu Pro Ala Asp Phe His Val Thr Ala Thr Thr Glu Asn Ser Pro Phe 
145 150 155 160 

Cys Gly He Ala His Asp Ser Lys Pro He Phe Gly He Gin Phe His 
165 170 175 

Pro Glu Val Thr His Ser Ser Gin Gly Lys Thr Leu Leu Lys Asn Phe 
180 185 190 

Ala Val Glu He Cys Gin Ala Ala Gin Thr Trp Thr Met Glu Asn Phe 
195 200 205 

He Asp Thr Glu lie Gin Arg He Arg Thr Leu Val Gly Pro Thr Ala 
210 215 220 

Glu Val He Gly Ala Val Ser Gly Gly Val Asp Ser Thr Val Ala Ala 
225 230 235 240 

Lys Leu Met Thr Glu Ala He Gly Asp Arg Phe His Ala He Leu Val 
245 250 255 

Asp Asn Gly Val Leu Arg Leu Asn Glu Ala Ala Asn Val Lys Lys He 
260 265 270 

Leu Gly Glu Gly Leu Gly He Asn Leu Thr Val Val Asp Ala Ser Glu 
275 280 285 

Glu Phe Leu Thr Lys Leu Lys Gly Val Thr Asp Pro Glu Lys Lys Arg 
290 295 300 

Lys He He Gly Asn Thr Phe He His Val Phe Glu Arg Glu Ala Ala 
305 310 315 " 320 

Arg He Gin Pro Lys Asn Gly Glu Glu He Glu Phe Leu Leu Gin Gly 
325 330 335 

Thr Leu Tyr Pro Asp Val He Glu Ser He Ser Phe Lys Gly Pro Ser 
340 345 350 



Gin Thr He Lys Thr His His Asn Val Gly Gly Leu Leu Asp Asn Met 
355 360 365 



Lys Leu Lys Leu He Glu Pro Leu Arg Glu Leu Phe Lys Asp Glu Val 
370 375 380 
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Arg His Leu Gly Glu Leu Leu Gly 
385 390 

His Pro Phe Pro Gly Pro Gly lie 
405 

Thr Lys Glu 



Glu Glu lie Arg Lys Ala Gly Leu 
435 440 

Ala Cys Leu Leu Pro Val Lys Ser 
450 455 

Thr Tyr Asp 
465 

Met Thr Ala Asp Trp Tyr Pro Phe 
485 

Ala Ser Arg He Val Asn Glu Val 
500 

Asp He Thr Ser Lys Pro Pro Ala 
515 520 



He Ser His Glu Leu Val Trp Arg 
395 400 

Ala He Arg Val Leu Gly Glu Val 
410 415 

Ala Asp His He Tyr lie 
430 

Tyr Asn Lys lie Ser Gin Ala Phe 
445 

Val Gly Val Met Gly Asp Gin Arg 
460 

lie Glu Thr Thr Asp Phe 
475 480 

Glu His Glu Phe Leu Lys His Val 
490 495 

Glu Gly Val Ala Arg Val Thr Tyr 
505 510 

Thr Val Glu Trp Glu 
525 



Gin Val Glu He Ala Arg Lys 
420 425 



Gin Val lie Ala Leu Arg Ala 
470 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1634 Base pairs 

(B) TYPE: Nucleic acid 

(C) STRANDEDNESS: Double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA for mRNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(ix) FEATURES: 

(A) NAME /KEY : 5'UTR 
<B) LOCATION: 1..519 

(ix) FEATURES: 

(A) NAME/KEY: CDS 
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(B) LOCATION: 520.. 1482 



JSfSS. 



hi 



(ix) FEATURES: 

(A) NAME /KEY : 3'UTR 

(B) LOCATION: 1483.. 1634 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

CCTCGAACAT CTATCTTCTG AGC TCG AT AG TCTACGAAAT CGGCACACTA GCCTAATTGC 60 

CGAGATGAAG AGCTCCAGGG AACCGTTAAA GATCTGATGT TCCATCTTCA ATCAGGACAA 120 

ATGTTACGGG ATGTCCCTGA CGCCACAGAA GGTAGCCTGG TGGTCCAGAC AGAAAAAGAG 180 

CCTACACCAA AGAAGAAACA TAACAAGAAA AAGCCTCCGC ATCGTTTTGG TAAATCATAA 240 

TAGGCACGAT GCGCATATAC CCTGACCATC ATAGCGGTTC CCCCCGCTAA CTGCTCCGAG 300 

CGGGTAACCC CATGTCACAA AGTGACTCTG TCTCTTCGTG GTAGGTGATG TCAAATTTTC 360 

ACGACTTCCC ACCCCGATGA GCATCCGTAT TCCTTTTCAT CTAAATTCTA ATAGATGGCT 420 

TATGGATTCT TATTGGCGAC TTACAAGCCT ATGTAGTTGG CTTCCCTCAA GTGTTCGTAG 480 

TCTACCACCT CACACCCGGT CTAACAGCTT ACGAGAATA ATG GCT ACT AAT GCA 534 

Met Ala Thr Asn Ala 
1 5 

ATC AAG CTT CTT GCG CCA GAT ATC CAC AGG GGT CTG GCA GAG CTG GTC 582 
He Lys Leu Leu Ala Pro Asp He His Arg Gly Leu Ala Glu Leu Val 
10 15 20 

GCT AAA CGC CTA GGC TTA CGT CTG ACA GAC TGC AAG CTT AAG CGG GAT 630 
Ala Lys Arg Leu Gly Leu Arg Leu Thr Asp Cys Lys Leu Lys Arg Asp 
25 30 35 

TGT AAC GGG GAG GCG ACA TTT TCG ATC GGA GAA TCT GTT CGA GAC CAG 678 
Cys Asn Gly Glu Ala Thr Phe Ser lie Gly Glu Ser Val Arg Asp Gin 
40 45 50 

GAT ATC TAC ATC ATC ACG CAG GTG GGG TCC GGG GAC GTG AAC GAC CGA 726 
Asp He Tyr lie lie Thr Gin Val Gly Ser Gly Asp Val Asn Asp Arg 
55 60 65 

GTG CTG GAG CTG CTC ATC ATG ATC AAC GCT AGC AAG ACG GCG TCT GCG 774 
Val Leu Glu Leu Leu He Met He Asn Ala Ser Lys Thr Ala Ser Ala 
70 75 80 ~ 85 

CGG CGA ATT ACG GCT GTG ATT CCA AAC TTC CCA TAC GCG CGG CAG GAC 822 
Arg Arg lie Thr Ala Val lie Pro Asn Phe Pro Tyr Ala Arg Gin Asp 
90 95 100 
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CGG AAG GAT AAG TCA CGG GCG CCA ATT ACC GCG AAG CTC ATG GCG GAC 870 
Arg Lys Asp Lys Ser Arg Ala Pro lie Thr Ala Lys Leu Met Ala Asp 
105 110 115 

ATG CTG ACT ACC GCG GGC TGC GAT CAT GTC ATC ACC ATG GAC TTA CAC 918 
Met Leu Thr Thr Ala Gly Cys Asp His Val lie Thr Met Asp Leu His 
120 125 130 

GCT TCG CAA ATC CAG GGC TTC TTT GAT GTA CCA GTT GAC AAC CTT TAC 966 
Ala Ser Gin lie Gin Gly Phe Phe Asp Val Pro Val Asp Asn Leu Tyr 
135 140 145 

GCA GAG CCT AGC GTG GTG AAG TAT ATC AAG GAG CAT ATT CCC CAC GAC 1014 
Ala Glu Pro Ser Val Val Lys Tyr lie Lys Glu His lie Pro His Asp 
150 155 160 165 

GAT GCC ATC ATC ATC TCG CCG GAT GCT GGT GGT GCC AAA CGT GCG TCG 1062 
Asp Ala lie lie lie Ser Pro Asp Ala Gly Gly Ala Lys Arg Ala Ser 
170 175 180 

H CTT CTA TCA GAT CGC CTA AAC TTG AAC TTT GCG CTG ATT CAT AAG GAA 1110 

Q Leu Leu Ser Asp Arg Leu Asn Leu Asn Phe Ala Leu lie His Lys Glu 

0 185 190 195 

01 CGT GCA AAG GCA AAC GAA GTG TCC CGC ATG GTT CTG GTC GGC GAT GTT 1158 
t* Arg Ala Lys Ala Asn Glu Val Ser Arg Met Val Leu Val Gly Asp Val 

111 200 205 210 

g ACC GAT AAA GTC TGC ATT ATC GTT GAC GAT ATG GCG GAT ACT TGT GGT 1206 

Q Thr Asp Lys Val Cys lie lie Val Asp Asp Met Ala Asp Thr Cys Gly 
fy 215 220 225 

m ACG CTG GCC AAG GCG GCA GAA GTG CTG CTA GAG CAC AAC GCG CGG TCT 1254 

m Thr Leu Ala Lys Ala Ala Glu Val Leu Leu Glu His Asn Ala Arg Ser 

fjj 230 235 240 245 

GTG ATA GCC ATT GTT ACC CAC GGT ATC CTT TCA GGA AAG GCC ATT GAG 1302 
Val He Ala He Val Thr His Gly He Leu Ser Gly Lys Ala He Glu 
250 255 260 

AAC ATC AAC AAT TCG AAG CTT GAT AGG GTT GTG TGT ACC AAC ACC GTG 1350 
Asn He Asn Asn Ser Lys Leu Asp Arg Val Val Cys Thr Asn Thr Val 
265 270 275 

CCA TTC GAG GAG AAG ATG AAG TTA TGC CCG AAG TTA GAT GTA ATT GAT 1398 
Pro Phe Glu Glu Lys Met Lys Leu Cys Pro Lys Leu Asp Val He Asp 
280 285 290 

ATC TCG GCA GTT CTT GCG GAA TCC ATT CGC CGT CTA CAC AAT GGT GAA 1446 
lie Ser Ala Val Leu Ala Glu Ser He Arg Arg Leu His Asn Gly Glu 
295 300 305 
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AGT ATC TCC TAC CTC TTT AAA AAC AAC CCA CTA TGATTTTGCT TCTCGATGCT 1499 
Ser lie Ser Tyr Leu Phe Lys Asn Asn Pro Leu 
310 315 320 

GGCTTCTTGA GGGCCAATTT TGCCGTAGAG GTAGTATCCC TTCTTTTTAT ATTGACTATT 1559 
TAACGAAGAC TATTTCTTCA TAAATGGACT TCGGCTTCAC TGTGAATCTC ACATGATATA 1619 
GTTGTTTCAG AGACC 1634 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 320 Amino acids 

(B) TYPE: Amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

H* Met Ala Thr Asn Ala lie Lys Leu Leu Ala Pro Asp lie His Arg Gly 

0 1 5 10 15 

%J Leu Ala Glu Leu Val Ala Lys Arg Leu Gly Leu Arg Leu Thr Asp Cys 



Mi 



20 25 30 

Lys Leu Lys Arg Asp Cys Asn Gly Glu Ala Thr Phe Ser lie Gly Glu 
35 40 45 

Ser Val Arg Asp Gin Asp lie Tyr He He Thr Gin Val Gly Ser Gly 
50 55 60 

Asp Val Asn Asp Arg Val Leu Glu Leu Leu He Met He Asn Ala Ser 
65 70 75 80 

Lys Thr Ala Ser Ala Arg Arg He Thr Ala Val lie Pro Asn Phe Pro 
85 90 95 

Tyr Ala Arg Gin Asp Arg Lys Asp Lys Ser Arg Ala Pro He Thr Ala 
100 105 110 

Lys Leu Met Ala Asp Met Leu Thr Thr Ala Gly Cys Asp His Val He 
115 120 125 

Thr Met Asp Leu His Ala Ser Gin He Gin Gly Phe Phe Asp Val Pro 
130 135 140 

Val Asp Asn Leu Tyr Ala Glu Pro Ser Val Val Lys Tyr He Lys Glu 
145 150 155 160 

His He Pro His Asp Asp Ala He He He Ser Pro Asp Ala Gly Gly 
165 170 175 
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Ala Lys Arg Ala Ser Leu Leu Ser Asp Arg Leu Asn Leu Asn Phe Ala 
180 185 190 

Leu lie His Lys Glu Arg Ala Lys Ala Asn Glu Val Ser Arg Met Val 
195 200 205 

Leu Val Gly Asp Val Thr Asp Lys Val Cys lie lie Val Asp Asp Met 
210 215 220 

Ala Asp Thr Cys Gly Thr Leu Ala Lys Ala Ala Glu Val Leu Leu Glu 
225 230 235 240 

His Asn Ala Arg Ser Val lie Ala lie Val Thr His Gly lie Leu Ser 
245 250 255 

Gly Lys Ala lie Glu Asn lie Asn Asn Ser Lys Leu Asp Arg Val Val 
260 265 270 

Cys Thr Asn Thr Val Pro Phe Glu Glu Lys Met Lys Leu Cys Pro Lys 
275 280 285 

Leu Asp Val lie Asp lie Ser Ala Val Leu Ala Glu Ser lie Arg Arg 
290 295 300 

Leu His Asn Gly Glu Ser lie Ser Tyr Leu Phe Lys Asn Asn Pro Leu 
305 310 315 320 



